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MICROFICHE REFERENCE 

A microfiche appendix is included of a computer program 
listing. The total number of microfiche is 7. The total 
s number of frames is 679. 

BACKGROUND OF THE INVENTION 

1. Technical Field 

This invention relates to speech communication systems 
30 and, more particularly, to systems for digital speech coding. 

2. Related Art 

One prevalent mode of human communication is by the 
use of communication systems. Communication systems 
l5 include both wireline and wireless radio based systems. 
Wireless communication systems are electrically connected 
with the wireline based systems and communicate with the 
mobile communication devices using radio frequency (RF) 
communication. Currently, the radio frequencies available 
2Q for communication in cellular systems, for example, are in 
the cellular frequency range centered around 900 MHz and 
in the personal communication services (PCS) frequency 
range centered around 1900 MHz. Data and voice transmis- 
sions within the wireless system have a bandwidth that 
consumes a portion of the radio frequency. Due to increased 
traffic caused by the expanding popularity of wireless com- 
munication devices, such as cellular telephones, it is desir- 
able to reduced bandwidth of transmissions within the 
wireless systems. 
3 0 Digital transmission in wireless radio communications is 
increasingly applied to both voice and data due to noise 
immunity, reliability, compactness of equipment and the 
ability to implement sophisticated signal processing func- 
tions using digital techniques. Digital transmission of speech 
35 signals involves the steps of sampling an analog speech 
waveform with an analog-to -digital converter, speech com- 
pression (encoding), transmission, speech decompression 
(decoding), digital-to-analog conversion, and playback into 
an earpiece or a loudspeaker. The sampling of the analog 
40 speech waveform with the analog- to-digital converter cre- 
ates a digital signal. However, the number of bits used in the 
digital signal to represent the analog speech waveform 
creates a relatively large bandwidth. For example, a speech 
signal that is sampled at a rate of 8000 Hz (once every 0.125 
45 rns), where each sample is represented by 16 bits, will result 
in a bit rate of 128,000 (16x8000) bits per second, or 128 
Kbps (Kilobits per second). 

Speech compression may be used to reduce the number of 
bits that represent the speech signal thereby reducing the 
50 bandwidth needed for transmission. However, speech com- 
pression may result in degradation of the quality of decom- 
pressed speech. In general, a higher bit rate will result in 
higher quality, while a lower bit rate will result in lower 
quality. However, modern speech compression techniques, 
55 such as coding techniques, can produce decompressed 
speech of relatively high quality at relatively low bit rates. 
In general, modern coding techniques attempt to represent 
the perceptually important features of the speech signal, 
without preserving the actual speech waveform. 
60 One coding technique used to lower the bit rate involves 
varying the degree of speech compression (i.e. varying the 
bit rate) depending on the part of the speech signal being 
compressed. Typically, parts of the speech signal for which 
adequate perceptual representation is more difficult (such as 
65 voiced speech, plosives, or voiced onsets) are coded and 
transmitted using a higher number of bits. Conversely, parts 
of the speech for which adequate perceptual representation 
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is less difficult (such as unvoiced, or the silence between multiplied by the fixed-codebook gain, to create a long-term 
words) are coded with a lower number of bits. The resulting excitation also known as a fixed codebook contribution. A 
average bit rate for the speech signal will be relatively lower long-term predictor contribution may be added to the long- 
than would be the case for a fixed bit rate that provides term excitation to create a short-term excitation that corn- 
decompressed speech of similar quality. s mon iy & referred to simply as an excitation. The long-term 
Speech compression systems, commonly called codecs, predictor contribution comprises the short-term excitation 
include an encoder and a decoder and may be used to reduce f r0 m the past multiplied by the long-term predictor gain. The 
the bit rate of digital speech signals. Numerous algorithms addition of the long-term predictor contribution alternatively 
have been developed for speech codecs that reduce the can 5e v i ewet j as an adaptive codebook contribution or as a 
number of bits required to digitally encode the original 10 i on g-term (pitch) filtering. The short-term excitation may be 
speech while attempting to maintain high quality recon- passed through a short-term inverse prediction filter (LPC) 
structed speech. Code-Excited Linear Predictive (CELP) that uses lhc sh orl-term (LPQ prediction coefficients quan- 
coding techniques, as discussed in the article entitled "Code- {{2J&6 by the encoder t0 generate synthesized speech. The 
Excited Linear Prediction: High-Quality Speech at Very synthesized speech may then be passed through a post-filter 
Low Rates," by M. R. Schroeder and B. S. Atal, Proc. 15 that reduces percepma i coding noise. 

ICASSP-85, pages 937-940, 1985, provide one effective ™ . . . , . . ^ , . 

, .. .... * it • l, . These speech compression techniques have resulted m 

speech coding algorithm An example of a variable rate lo amount of bandwidth used „, lransmit a eech 

CELP based speech coder is 11A (Telecommun.cauons ^ biDdM ^ , icu , af 

S&^n^ £ w- S ^f" l \ ls , d "! gned f° r important in a communication system that has to allocate its 

CDMA (Code Division Multiple Access) applications The M a of ^ Accordin j thcre ^ , 

CELP coding technique utilizes several prediction tech- , r , j ,u a e u *u * 

.l jj c *i_ • ■ • need for systems and methods of speech coding that are 

niques to remove the redundancy trom the speech signal. , , c J . . . . u** * a j? m ™-^u 

™ Arir n Jt v-r i. j- fl. \i * capable of mimmizing the average bit rate needed tor speech 

The CELP coding approach is frame -based in the sense that r , ... \. , ... A _ „„, 

. * v , . 1 ■ i . i 1 c representation, while providing high quality decompressed 

it stores sampled input speech signals into a block ot r , r 

samples called frames. The frames of data may then be 25 p 

processed to create a compressed speech signal in digital SUMMARY 
form. 

The CELP coding approach uses two types of predictors, This invention provides systems for encoding and decod- 
a short-term predictor and a long-term predictor. The short- ing speech signals. Hie embodiments may use the CELP 
term predictor typically is applied before the long-term 30 coding technique and prediction based coding as a frame- 
predictor. A prediction error derived from the short-term work to employ signal-processing functions using waveform 
predictor is commonly called short-term residual, and a matching and perceptual related techniques. These tech- 
prediction error derived from the long-term predictor is niques allow the generation of synthesized speech that 
commonly called long-term residual. The long-term residual closely resembles the original speech by including percep- 
may be coded using a fixed codebook that includes a 35 hi al features while maintaining a relatively low bit rate. One 
plurality of fixed codebook entries or vectors. One of the application of the embodiments is in wireless communica- 
entries may be selected and multiplied by a fixed codebook *ion systems. In this application, the encoding of original 
gain to represent the long-term residual. The short-term speech, or the decoding to generate synthesized speech, may 
predictor also can be referred to as an LPC (Linear Predic- occur at mobile communication devices. In addition, encod- 
tion Coding) or a spectral representation, and typically 40 ing and decoding may occur within wireline-based systems 
comprises 10 prediction parameters. The long-term predic- or within other wireless communication systems to provide 
tor also can be referred to as a pitch predictor or an adaptive interfaces to wireline-based systems, 
codebook and typically comprises a lag parameter and a One embodiment of a speech compression system 
long-term predictor gain parameter. Each lag parameter also includes a full-rate codec, a half-rate codec, a quarter-rate 
can be called a pitch lag, and each long-terra predictor gain 45 codec and an eighth-rate codec each capable of encoding 
parameter can also be called an adaptive codebook gain. The and decoding speech signals. The full-rate, half-rate, 
lag parameter defines an entry or a vector in the adaptive quarter-rate and eighth-rate codecs encode the speech sig- 
codebook. nals at bit rates of 8.5 Kbps, 4 Kbps, 2 Kbps and 0.8 Kbps, 

The CELP encoder performs an LPC analysis to deter- respectively. The speech compression syslem performs a 
mine the short-term predictor parameters. Following the 50 fate selection on a frame of a speech signal to select one of 
LPC analysis, the long-term predictor parameters may be the codecs. The rate selection is performed on a frame-by- 
determined. In addition, determination of the fixed codebook frame basis. Frames are created by dividing the speech 
entry and the fixed codebook gain that best represent the signal into segments of a finite length of time. Since each 
long-term residual occurs. The powerful concept of frame may be coded with a different bit rate, the speech 
analysis-by-synthesis (ABS) is employed in CELP coding, ss compression system is a variable-rate speech compression 
In the ABS approach, the best contribution from the fixed system that codes the speech at an average bit rate, 
codebook, the best fixed codebook gain, and the best long- The rate selection is determined by characterization of 
term predictor parameters may be found by synthesizing each frame of the speech signal based on the portion of the 
them using an inverse prediction filter and applying a speech signal contained in the particular frame. For 
perceptual weighting measure. The short-term (LPC) pre- 60 example, frames may be characterized as stationary voiced, 
diction coefficients, the fixed-codebook gain, as well as the non-stationary voiced, unvoiced, background noise, silence 
lag parameter and the long-term gain parameter may then be etc. In addition, the rate selection is based on a Mode that the 
quantized. The quantization indices, as well as the fixed speech compression system is operating within. The differ- 
codebook indices, may be sent from the encoder to the ent Modes indicate the desired average bit rate. The codecs 
decoder. 65 are designed for optimized coding within the different 

The CELP decoder uses the fixed codebook indices to characterizations of the speech signals. Optimal coding 

extract a vector from the fixed codebook. The vector may be balances the desire to provide synthesized speech of the 
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highest perceptual quality while maintaining the desired 
average bit rate, thereby maximizing use of the available 
bandwidth. During operation, the speech compression sys- 
tem selectively activates the codecs based on the Mode as 
well as characterization of the frame in an attempt to 
optimize the perceptual quality of the synthesized speech. 

Once the full or the half-rate codec is selected by the rate 
selection, a type classification of the speech signal occurs to 
further optimize coding. The type classification may be a 
first type (i.e. a Type One) for frames containing a harmonic 
structure and a formant structure that do not change rapidly 
or a second type (i.e. a Type Zero) for all other frames. The 
bit allocation of the full-rate and half-rate codecs may be 
adjusted in response to the type classification to further 
optimize the coding of the frame. The adjustment of the bit 
allocation provides improved perceptual quality of the 
reconstructed speech signal by emphasizing different 
aspects of the speech signal within each frame. 

Accordingly, the speech coder is capable of selectively 
activating the codecs to maximize the overall quality of a 
reconstructed speech signal while maintaining the desired 
average bit rate. Other systems, methods, features and 
advantages of the invention will be or will become apparent 
to one with skill in the art upon examination of the following 
figures and detailed description. It is intended that all such 
additional systems, methods, features and advantages be 
included within this description, be within the scope of the 
invention, and be protected by the accompanying claims. 

BRIEF DESCRIPTION OF THE FIGURES 

The components in the figures are not necessarily to scale, 
emphasis instead being placed upon illustrating the princi- 
pals of the invention. Moreover, in the figures, like reference 
numerals designate corresponding parts throughout the dif- 
ferent views. 

FIG. 1 is a block diagram of one embodiment of a speech 
compression system. 

FIG. 2 is an expanded block diagram of one embodiment 
of the encoding system illustrated in FIG. 1. 

FIG. 3 is an expanded block diagram of one embodiment 
of the decoding system illustrated in FIG. 1. 

FIG. 4 is a table illustrating the bit allocation of one 
embodiment of the full-rate codec. 

FIG. 5 is a table illustrating the bit allocation of one 
embodiment of the half-rate codec. 

FIG. 6 is a table illustrating the bit allocation of one 
embodiment of the quarter-rate codec. 

FIG. 7 is a table illustrating the bit allocation of one 
embodiment of the eighth-rate codec. 

FIG. 8 is an expanded block diagram of one embodiment 
of the pre-processing module illustrated in FIG. 2. 

FIG. 9 is an expanded block diagram of one embodiment 
of the initial frame -processing module illustrated in FIG. 2 
for the full and half-rate codecs. 

FIG. 10 is an expanded block diagram of one embodiment 
of the first sub -frame processing module illustrated in FIG. 
2 for the full and half-rate codecs. 

FIG. 11 is an expanded block diagram of one embodiment 
of the first frame processing module, the second sub-frame 
processing module and the second frame processing module 
illustrated in FIG. 2 for the full and half-rate codecs. 

FIG. 12 is an expanded block diagram of one embodiment 
of the decoding system illustrated in FIG. 3 for the full and 
half-rate codecs. 
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DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

The embodiments are discussed with reference to speech 
signals, however, processing of any other signal is possible. 

5 It will also be understood that the numerical values disclosed 
may be numerically represented by floating point, fixed 
point, decimal, or other similar numerical representation that 
may cause slight variation in the values but will not com- 
promise functionality. Further, functional blocks identified 

10 as modules are not intended to represent discrete structures 
and may be combined or further sub-divided in various 
embodiments. 

FIG. 1 is a block diagram of one embodiment of the 
speech compression system 10. The speech compression 

15 system 10 includes an encoding system 12, a communica- 
tion medium 14 and a decoding system 16 that may be 
connected as illustrated. The speech compression system 10 
may be any system capable of receiving and encoding a 
speech signal 18, and then decoding it to create post- 

20 processed synthesized speech 20. In a typical communica- 
tion system, the wireless communication system is electri- 
cally connected with a public switched telephone network 
(PSTN) within the wireline-based communication system. 
Within the wireless communication system, a plurality of 

25 base stations are typically used to provide radio communi- 
cation with mobile communication devices such as a cellular 
telephone or a portable radio transceiver. 

The speech compression system 10 operates to receive the 
speech signal 18. The speech signal 18 emitted by a sender 

30 (not shown) can be, for example, captured by a microphone 
(not shown) and digitized by an analog-to-digital converter 
(not shown). The sender may be a human voice, a musical 
instrument or any other device capable of emitting analog 
signals. The speech signal 18 can represent any type of 

35 sound, such as, voice speech, unvoiced speech, background 
noise, silence, music etc. 

The encoding system 12 operates to encode the speech 
signal 18. The encoding system 12 may be part of a mobile 
communication device, a base station or any other wireless 

40 or wireline communication device that is capable of receiv- 
ing and encoding speech signals 18 digitized by an analog- 
to-digital converter. The wireline communication devices 
may include Voice over Internet Protocol (VoIP) devices and 
systems. The encoding system 12 segments the speech 

45 signal 18 into frames to generate a bitstream. One embodi- 
ment of the speech compression system 10 uses frames that 
comprise 160 samples that, at a sampling rate of 8000 Hz, 
correspond to 20 milliseconds per frame. The frames rep- 
resented by the bitstream may be provided to the commu- 

50 nication medium 14. 

The communication medium 14 may be any transmission 
mechanism, such as a communication channel, radio waves, 
microwave, wire transmissions, fiber optic transmissions, or 
any medium capable of carrying the bitstream generated by 

55 the encoding system 12. The communication medium 14 
may also include transmitting devices and receiving devices 
used in the transmission of the bitstream. An example 
embodiment of the communication medium 14 can include 
communication channels, antennas and associated transceiv- 

60 ers for radio communication in a wireless communication 
system. The communication medium 14 also can be a 
storage mechanism, such as, a memory device, a storage 
media or other device capable of storing and retrieving the 
bitstream generated by the encoding system 12, The com- 

65 munication medium 14 operates to transmit the bitstream 
generated by the encoding system 12 to the decoding system 
16. 
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The decoding system 16 receives the bitstream from the codecs 22, 24, 26, and 28. Mode 0 may be referred to as a 

communication medium 14. The decoding system 14 may be premium mode in which most of the frames may be coded 

part of a mobile communication device, a base station or with the full-rate codec 22; fewer of the frames may be 

other wireless or wireline communication device that is coded with the half-rate codec 24; and frames comprising 

capable of receiving the bitstream. The decoding system 16 5 silence and background noise may be coded with the 

operates to decode the bitstream and generate the post- quarter- rate codec 26 and the eighth-rate codec 28. Mode 1 

processed synthesized speech 20 in the form of a digital may be referred to as a standard mode in which frames with 

signal The post-processed synthesized speech 20 may then high information content, such as onset and some voiced 

be converted to an analog signal by a digital-to-analog frames, may be coded with the full-rate codec 22. In 

converter (not shown). The analog output of the digital-to- 1Q addition, other voiced and unvoiced frames may be coded 

analog converter may be received by a receiver (not shown) with the half-rate codec 24, some unvoiced frames may be 

that may be a human ear, a magnetic tape recorder, or any coded with the quarter-rate codec 26, and silence and 

other device capable of receiving an analog signal. stationary background noise frames may be coded with the 

Alternatively, a digital recording device, a speech recogni- eighth-rate codec 28. 

tion device, or any other device capable of receiving a digital Mode 2 may be referred to as an economy mode in which 

signal may receive the post-processed synthesized speech only a few frames of high information content may be coded 

20. with the full-rate codec 22. Most of the frames in Mode 2 

One embodiment of the speech compression system 10 may be coded with the half-rate codec 24 with the exception 

also includes a Mode line 21. The Mode line 21 carries a of some unvoiced frames that may be coded with the 

Mode signal that controls the speech compression system 10 20 quarter- rate codec 26. Silence and stationary background 

by indicating the desired average bit rate for the bitstream. noise frames may be coded with the eighth-rale codec 28 in 

The Mode signal may be generated externally by, for Mode 2. Accordingly, by varying the selection of the codecs 

example, a wireless communication system using a Mode 22, 24, 26, and 28 the speech compression system 10 can 

signal generation module. The Mode signal generation mod- deliver reconstructed speech at the desired average bit rate 

ule determines the Mode Signal based on a plurality of 25 while attempting to maintain the highest possible quality, 

factors, such as, the desired quality of the post-processed Additional Modes, such as, a Mode three operating in a 

synthesized speech 20, the available bandwidth, the services super economy Mode or a half-rate max Mode in which the 

contracted by a user or any other relevant factor. The Mode maximum codec activated is the half-rate codec 24 are 

signal is controlled and selected by the communication possible in alternative embodiments, 

system that the speech compression system 10 is operating 30 Further control of the speech compression system 10 also 

within. The Mode signal may be provided to the encoding may be provided by a half rate signal line 30. The half rate 

system 12 to aid in the determination of which of a plurality signal line 30 provides a half rate signaling flag. The half 

of codecs may be activated within the encoding system 12. rate signaling flag may be provided by an external source 

The codecs comprise an encoder portion and a decoder such as a wireless communication system. When activated, 

portion that are located within the encoding system 12 and 35 the half rate signaling flag directs the speech compression 

the decoding system 16, respectively. In one embodiment of system 10 to use the half-rate codec 24 as the maximum rate, 

the speech compression system 10 there are four codecs Determination of when to activate the half rate signaling flag 

namely; a full-rate codec 22, a half-rate codec 24, a quarter- is performed by the communication system that the speech 

rate codec 26, and an eighth-rate codec 28. Each of the compression system 10 is operating within. Similar to the 

codecs 22, 24, 26, and 28 is operable to generate the 40 Mode signal determination, a half rate-signaling module 

bitstream. The size of the bitstream generated by each codec controls activation of the half rate signaling flag based on a 

22, 24, 26, and 28, and hence the bandwidth or capacity plurality of factors that are determined by the communica- 

needed for transmission of the bitstream via the communi- lion system. In alternative embodiments, the half rate sig- 

cation medium 14 is different. naling flag could direct the speech compression system 10 to 

In one embodiment, the full-rate codec 22, the half-rate 45 use one codec 22 > 24 > 26 > and 28 in P lace of anolher or 

codec 24, the quarter-rate codec 26 and the eighth-rale codec identify one or more of the codecs 22, 24, 26, and 28 as the 

28 generate 170 bits, 80 bits, 40 bits and 16 bits, maximum or minimum rate. 

respectively, per frame. The size of the bitstream of each In one embodiment of the speech compression system 10, 

frame corresponds to a bit rate, namely, 8.5 Kbps for the the full and half -rate codecs 22 and 24 may be based on an 

full-rate codec 22, 4.0 Kbps for the half-rate codec 24, 2.0 50 eX-CELP (extended CELP) approach and the quarter and 

Kbps for the quarter-rate codec 26, and 0.8 Kbps for the eighth-rate codecs 26 and 28 may be based on a perceptual 

eighth-rate codec 28. However, fewer or more codecs as matching approach. The eX-CELP approach extends the 

well as other bit rates are possible in alternative embodi- traditional balance between perceptual matching and wavc- 

ments. By processing the frames of the speech signal 18 with form matching of traditional CELP. In particular, the 

the various codecs, an average bit rate is achieved. The 55 eX-CELP approach categorizes the frames using a rate 

encoding system 12 determines which of the codecs 22, 24, selection and a type classification that will be described later. 

26, and 28 may be used to encode a particular frame based Within the different categories of frames, different encoding 

on characterization of the frame, and on the desired average approaches may be utilized that have different perceptual 

bit rate provided by the Mode signal. Characterization of a matching, different waveform matching, and different bit 

frame is based on the portion of the speech signal 18 60 assignments. The perceptual matching approach of the 

contained in the particular frame. For example, frames may quarter-rate codec 26 and the eighth-rate codec 28 do not use 

be characterized as stationary voiced, non-stationary voiced, waveform matching and instead concentrate on the percep- 

unvoiced, onset, background noise, silence etc. tual aspects when encoding frames. 

The Mode signal on the Mode signal line 21 in one The coding of each frame with either the eX-CELP 

embodiment identifies a Mode 0, a Mode 1, and a Mode 2. 65 approach or the perceptual matching approach may be based 

Each of the three Modes provides a different desired average on further dividing the frame into a plurality of sub frames, 

bit rate that can vary the percentage of usage of each of the The subframes may be different in size and in number for 
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each codec 22, 24, 26, and 28. In addition, with respect to to each rale encoder 36, 38, 40, and 42. The sub-division of 
the eX-CELP approach, the subframes may be different for the initial frame -processing module 44 into the respective 
each category. Within the subframes, speech parameters and initial frame processing modules 46, 48, 50, and 52 corre- 
waveforms may be coded with several predictive and non- sponds to a respective rate encoder 36, 38, 40, and 42. 
predictive scalar and vector quantization techniques. In 5 The initial frame-processing module 44 performs corn- 
scalar quantization a speech parameter or element may be mon processing to determine a rate selection that activates 
represented by an index location of the closest entry in a one 0 f the rate encoders 36, 38, 40, and 42. In one 
representative table of scalars. In vector quantization several embodiment, the rate selection is based on the characteriza- 
speech parameters may be grouped to form a vector. The t j on 0 f mc fame of the speech signal 18 and the Mode the 
vector may be represented by an index location of the closest 10 speech compression system 10 is operating within. Activa- 
entry in a representative table of vectors. tj on 0 f 0De 0 f tDe ra te encoders 36, 38, 40, and 42 corre- 
In predictive coding, an element may be predicted from spondingly activates one of the initial frame-processing 
the past. The element may be a scalar or a vector. The modules 46, 48, 50, and 52. 

prediction error may then be quantized, using a table of The particular initial frame-processing module 46, 48, 50, 

scalars (scalar quantization) or a table of vectors (vector ^ an d 52 is activated to encode aspects of the speech signal 18 

quantization). The eX-CELP coding approach, similarly to that are common to the entire frame. The encoding by the 

traditional CELP, uses the powerful Analysis-by-Synthesis initial frame-processing module 44 quantizes parameters of 

(ABS) scheme for choosing the best representation for the speech signal 18 contained in a frame. The quantized 

several parameters. In particular, the parameters may be the parameters result in generation of a portion of the bitstrcam. 

adaptive codebook, the fixed codebook, and their corre- 20 \ n general, the bitstream is the compressed representation of 

sponding gains. The ABS scheme uses inverse prediction a f ram e of the speech signal 18 that has been processed by 

filters and perceptual weighting measures for selecting the t h e encoding system 12 through one of the rate encoders 36, 

best codebook entries. 38 t 40^ an j 42. 

One implementation of an embodiment of the speech [ n addition to the rate selection, the initial frame- 
compression system 10 may be in a signal-processing device 25 processing module 44 also performs processing to determine 
such as a Digital Signal Processing (DSP) chip, a mobile a type classification for each frame that is processed by the 
communication device or a radio transmission base station. full and half-rate encoders 36 and 38. The type classification 
The signal-processing device may be programmed with 0 f one embodiment classifies the speech signal 18 repre- 
source code. The source code may be first translated into sented by a frame as a first type (i.e., a Type One) or as a 
fixed point, and then translated into the programming lan- second type (i.e., a Type Zero). The type classification of one 
guage that is specific to the signal-processing device. The embodiment is dependent on the nature and characteristics 
translated source code may then be downloaded and run in 0 f the speech signal 18. In an alternate embodiment, addi- 
the signal-processing device. One example of source code is tional type classifications and supporting processing may be 
the C language computer program utilized by one embodi- provided. 

ment of the speech compression system 10 that is included 35 0 ne classification includes frames of the speech 

in the attached microfiche appendix as Appendix A and B. signa ] ig mat exhibit stationary behavior. Frames exhibiting 

FIG. 2 is a more detailed block diagram of the encoding stationary behavior include a harmonic structure and a 

system 12 illustrated in FIG. 1. One embodiment of the formant structure that do not change rapidly. All other 

encoding system 12 includes a pre-processing module 34, a 4Q frames may be classified with the Type Zero classification, 

full-rate encoder 36, a half-rate encoder 38, a quarter-rate In alternative embodiments, additional type classifications 

encoder 40 and an eighth-rate encoder 42 that may be may classify frames into additional classification based on 

connected as illustrated. The rate encoders 36, 38, 40, and 42 time-domain, frequency domain, etc. The type classification 

include an initial frame-processing module 44 and an optimizes encoding by the initial full-rate frame-processing 

excitation -processing module 54. 45 module 46 and the initial half-rate frame-processing module 

The speech signal 18 received by the encoding system 12 48, as will be later described. In addition, both the type 

is processed on a frame level by the pre-processing module classification and the rate selection may be used to optimize 

34. The pre-processing module 34 is operable to provide encoding by portions of the excitation-processing module 54 

initial processing of the speech signal 18. The initial pro- that correspond to the full and half-rate encoders 36 and 38. 

cessing can include filtering, signal enhancement, noise 50 One embodiment of the excitation-processing module 54 

removal, amplification and other similar techniques capable may be sub-divided into a full-rate module 56, a half-rate 

of optimizing the speech signal 18 for subsequent encoding. module 58, a quarter-rate module 60, and an eighth-rate 

The full, half, quarter and eighth-rate encoders 36, 38, 40, module 62. The rate modules 56, 58, 60, and 62 correspond 
and 42 are the encoding portion of the full, half, quarter and to the rate encoders 36, 38, 40, and 42 as illustrated in FIG. 
eighth-rate codecs 22, 24, 26, and 28, respectively. The 55 2. The full and half-rate modules 56 and 58 of one embodi- 
initial frame -processing module 44 performs initial frame meat both include a plurality of frame processing modules 
processing, speech parameter extraction and determines and a plurality of subframe processing modules that provide 
which of the rate encoders 36, 38, 40, and 42 will encode a substantially different encoding as will be discussed, 
particular frame. The initial frame-processing module 44 The portion of the excitation processing module 54 for 
may be illustratively sub-divided into a plurality of initial 60 both the full and half-rate encoders 36 and 38 include type 
frame processing modules, namely, an initial full frame selector modules, first subframe processing modules, second 
processing module 46, an initial half frame-processing mod- subframe processing modules, first frame processing mod- 
ule 48, an initial quarter frame-processing module 50 and an ules and second subframe processing modules. More 
initial eighth frame-processing module 52. However, it specifically, the full-rate module 56 includes an F type 
should be noted that the initial frame-processing module 44 65 selector module 68, an F0 first subframe processing module 
performs processing that is common to all the rate encoders 70, an Fl first frame-processing module 72, an Fl second 
36, 38, 40, and 42 and particular processing that is particular subframe processing module 74 and an Fl second frame- 
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processing module 76. The term "F" indicates full-rate, and decoding system 16 includes a full-rate decoder 90, a 
"0" and "1" signify Type Zero and Type One, respectively. half-rate decoder 92, a quarter-rate decoder 94, an eighth- 
Similarly, the half-rate module 58 includes an H type rate decoder 96, a synthesis filter module 98 and a post- 
selector module 78, an HO first subframe processing module processing module 100. The full, half, quarter and eighth - 
80, an HI first frame -processing module 82, an HI second 5 rate decoders 90, 92, 94, and 96, the synthesis filter module 
subframe processing module 84, and an HI second frame- 98 and the post-processing module 100 are the decoding 
processing module 86. portion of the full, half, quarter and eighth-rate codecs 22, 

The F and H type selector modules 68, 78 direct the 24, 26, and 28. 

processing of the speech signals 18 to further optimize the The decoders 90, 92, 94, and 96 receive the bitstream and 

encoding process based on the type classification. Classifi- 1° decode the digital signal to reconstruct different parameters 

cation as Type One indicates the frame contains a harmonic of the speech signal 18. The decoders 90, 92, 94, and 96 may 

structure and a form ant structure that do not change rapidly, be activated to decode each frame based on the rate selec- 

such as stationary voiced speech. Accordingly, the bits used tion. The rate selection may be provided from the encoding 

to represent a frame classified as Type One may be allocated system 12 to the decoding system 16 by a separate infor- 

to facilitate encoding that takes advantage of these aspects in 35 mation transmittal mechanism, such as a control channel in 

representing the frame. Classification as Type Zero indicates a wireless communication system. In this example 

the frame may exhibit non-stationary behavior, for example, embodiment, the rate selection may be provided to the 

a harmonic structure and a formant structure that changes mobile communication devices as part of broadcast beacon 

rapidly or the frame may exhibit stationary unvoiced or signals generated by the base stations within the wireless 

noise-like characteristics. The bit allocation for frames clas- 20 communications system. In general, the broadcast beacon 

sified as Type Zero may be consequently adjusted to better signals are generated to provide identifying information 

represent and account for this behavior. used to establish communications between the base stations 

For the full rate module 56, the F0 first subframe- and the mobile communication devices, 
processing module 70 generates a portion of the bitstream The synthesis filter 98 and the post-processing module 
when the frame being processed is classified as Type Zero. 25 100 are part of the decoding process for each of the decoders 
Type Zero classification of a frame activates the F0 first 90, 92, 94, and 96. Assembling the parameters of the speech 
sub frame-processing module 70 to process the frame on a signal 18 that are decoded by the decoders 90, 92, 94, and 
subframe basis. The Fl first frame-processing module 72, 96 using the synthesis filter 98, generates synthesized 
the Fl second subframe processing module 74, and the Fl speech. The synthesized speech is passed through the post- 
second frame-processing modules 76 combine to generate a 30 processing module 100 to create the post-processed synthe- 
portion of the bitstream when the frame being processed is sized speech 20. 

classified as Type One. Type One classification involves One embodiment of the full-rate decoder 90 includes an 

both subframe and frame processing within the full rate F type selector 102 and a plurality of excitation reconstruc- 

module 56. ^ tion modules. The excitation reconstruction modules com- 

Similarly, for the half rate module 58, the HO first prise an F0 excitation reconstruction module 104 and an Fl 
sub frame-processing module 80 generates a portion of the excitation reconstruction module 106. In addition, the full- 
bitstream on a sub -frame basis when the frame being pro- rate decoder 90 includes a linear prediction coefficient 
cessed is classified as Type Zero. Further, the HI first (LPC) reconstruction module 107. The LPC reconstruction 
frame-processing module 82, the HI second subframe pro- 4Q module 107 comprises an F0 LPC reconstruction module 
cessing module 84, and the HI second frame -processing 108 and an Fl LPC reconstruction module 110. 
module 86 combine to generate a portion of the bitstream Similarly, one embodiment of the half-rate decoder 92 
when the frame being processed is classified as Type One. includes an H type selector 112 and a plurality of excitation 
As in the full rate module 56, the Type One classification reconstruction modules. The excitation reconstruction mod- 
involves both subframe and frame processing. 45 u l e s comprise an HO excitation reconstruction module 114 

The quarter and eighth-rate modules 60 and 62 are part of and an HI excitation reconstruction module 116. In addition, 

the quarter and eighth-rate encoders 40 and 42, respectively, the half-rate decoder 92 comprises a linear prediction coef- 

and do not include the type classification. The type classi- ficient (LPC) reconstruction module that is an H LPC 

ficalion is not included due to the nature of the frames that reconstruction module 118. Although similar in concept, the 

are processed. The quarter and eighth -rate modules 60 and 50 full and half-rate decoders 90 and 92 are designated to 

62 generate a portion of the bitstream on a subframe basis decode bitstreams from the corresponding full and half-rate 

and a frame basis, respectively, when activated. encoders 36 and 38, respectively. 

The rate modules 56, 58, 60, and 62 generate a portion of The F and H type selectors 102 and 112 selectively 
the bitstream that is assembled with a respective portion of activate respective portions of the full and half-rate decoders 
the bitstream that is generated by the initial frame processing ss 90 and 92 depending on the type classification. When the 
modules 46, 48, 50, and 52 to create a digital representation type classification is Type Zero, the FO or HO excitation 
of a frame. For example, the portion of the bitstream reconstruction modules 104 or 114 are activated, 
generated by the initial full-rate frame-processing module 46 Conversely, when the type classification is Type One, the Fl 
and the full-rate module 56 may be assembled to form the or HI excitation reconstruction modules 106 or 116 are 
bitstream generated when the full-rate encoder 36 is acti- 60 activated. The F0 or Fl LPC reconstruction modules 108 or 
vated to encode a frame. The bitstreams from each of the 110 are activated by the Type Zero and Type One type 
encoders 36, 38, 40, and 42 may be further assembled to classifications, respectively. The H LPC reconstruction mod- 
form a bitstream representing a plurality of frames of the ule 118 is activated based solely on the rate selection, 
speech signal 18. The bitstream generated by the encoders The quarter-rate decoder 94 includes a Q excitation recon- 
36, 38, 40, and 42 is decoded by the decoding system 16. 65 struction module 120 and a Q LPC reconstruction module 

FIG. 3 is an expanded block diagram of the decoding 122. Similarly, the eighth-rate decoder 96 includes an E 

system 16 illustrated in FIG. 1. One embodiment of the excitation reconstruction module 124 and an E LPC recon- 
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siruction module 126. Bo lh the respective Q or E excitation are generated on a frame basis or on a subframe basis, 

reconstruction modules 120 or 124 and the respective Q or respectively, by the encoding system 12. As will be 

E LPC reconstruction modules 122 or 126 are activated described later, the first portion and the second portion of the 

based solely on the rate selection. bitstrcam vary depending on the codec 22, 24, 26, and 28 

Each of the excitation reconstruction modules is operable 5 selected to encode and decode a frame of the speech signal 

to provide the short-term excitation on a short-term excita- 18, 

tion line 128 when activated. Similarly, each of the LPC i.i Bit Allocation for the Full-Rate Codec 

reconstruction modules operate to generate the short-term Referring now to FIGS. 2, 3, and 4, the full-rate bitstream 

prediction coefficients on a short-term prediction coefficients of the fuu_ rat e codec 22 will be described. Referring now to 

line 130. The short-term excitation and the short-term pre- ]0 FIG 4 the bit a n ocat j on for thc fall-rate codec 22 includes 

diction coefficients are provided to the synthesis filter 98. In a Une S p ectrum frequency (LSF) component 140, a type 

addition, m one embodiment, the short-term prediction component 142, an adaptive codebook component 144, a 

coefficients are provided to the post-processing module 100 fij£ed comp onent 146 and a gain component 147. 

as illustrated in FIG. 3. Th e g am component 147 comprises an adaptive codebook 

The post-processing module 100 can include filtering, ]5 ga j n component 148 and a fixed codebook gain component 
signal enhancement, noise modification, amplification, tilt 150. The bitstream allocation is further defined by a Type 
correction and other similar techniques capable of iraprov- Zero column 152 and a Type One column 154. Thc Type 
ing the perceptual quality of the synthesized speech. The Zero and Type One columns 152 and 154 designate the 
post-processing module 100 is operable to decrease the allocation of the bits in the bitstream based on the type 
audible noise without degrading the synthesized speech. 2 o classification of the speech signal 18 as previously dis- 
Decreasing the audible noise may be accomplished by cussed. In one embodiment, the Type Zero column 152 and 
emphasizing the formant structure of the synthesized speech the Type One column 154 both use 4 subframes of 5 
or by suppressing only the noise in the frequency regions milliseconds each to process the speech signals 18. 
that are perceptually not relevant for the synthesized speech. The initial full frame-processing module 46, illustrated in 
Since audible noise becomes more noticeable at lower bit 25 pTQ. 2, generates the LSF component 140. The LSF corn- 
rates, one embodiment of the post-processing module 100 ponent 140 is generated based on the short-term predictor 
may be activated to provide post-processing of the synthe- parameters. The short-term predictor parameters are con- 
sized speech differently depending on the rate selection. V erted to a plurality of line spectrum frequencies (LSFs). 
Another embodiment of the post-processing module 100 The LSFs represent the spectral envelope of a frame. In 
may be operable to provide different post-processing to 30 addition, a plurality of predicted LSFs from the LSFs of 
different groups of the decoders 90, 92, 94, and 96 based on previous frames are determined. The predicted LSFs are 
the rate selection. subtracted from the LSFs to create an LSFs prediction error. 

During operation, the initial frame-processing module 44 In one embodiment, the LSFs prediction error comprises a 
illustrated in FIG. 2 analyzes the speech signal 18 to vector of 10 parameters. The LSF prediction error is corn- 
determine the rate selection and activate one of the codecs 35 bined with the predicted LSFs to generate a plurality of 
22, 24, 26, and 28. If for example, the full-rate codec 22 is quantized LSFs. The quantized LSFs are interpolated and 
activated to process a frame based on the rate selection, the converted to form a plurality of quantized LPC coefficients 
initial full-rate frame -processing module 46 determines the Aq(z) for each subframe as will be discussed in detail later, 
type classification for the frame and generates a portion of l n addition, the LSFs prediction error is quantized to gen- 
the bitstream. The full-rate module 56, based on the type 40 erate the LSF component 140 that is transmitted to the 
classification, generates the remainder of the bitstream for decoding system 16. 

the frame. When the bitstream is received at the decoding system 16, 

The bitstream may be received and decoded by the the LSF component 140 is used to locate a quantized vector 

full-rate decoder 90 based on the rate selection. The full-rate representing a quantized LSFs prediction error. The quan- 

decoder 90 decodes the bitstream utilizing the type classi- 45 lized LSFs prediction error is added to the predicted LSFs to 

ficalion that was determined during encoding. The synthesis generate quantized LSFs. The predicted LSFs are deter- 

filter 98 and the post-processing module 100 use the param- mined from the LSFs of previous frames within the decod- 

eters decoded from the bitstream to generate the post- ing system 16 similarly to the encoding system 12. The 

processed synthesized speech 20. The bitstream that is resulting quantized LSFs may be interpolated for each 

generated by each of the codecs 22, 24, 26, and 28 contains 50 subframe using a predetermined weighting. The predeter- 

significantly different bit allocations to emphasize different mined weighting defines an interpolation path that may be 

parameters and/or characteristics of the speech signal 18 fixed or variable. The interpolation path is between the 

within a frame. quantized LSFs of the previous frame and the quantized 

1 .0 Bit Allocation LSFs of the current frame. The interpolation path may be 

FIGS. 4, 5, 6 and 7 are tables illustrating one embodiment 55 used to provide a spectral envelope representation for each 

of the bit-allocation for the full-rate codec 22, the half-rate subframe in the current frame. 

codec 24, the quarter-rate codec 26, and the eighth-rate For frames classified as Type Zero, one embodiment of 
codec 28, respectively. The bit-allocation designates the the LSF component 140 is encoded utilizing a plurality of 
portion of the bitstream generated by the initial frame- stages 156 and an interpolation element 158 as illustrated in 
processing module 44, and the portion of the bitstream 60 FIG. 4. The stages 156 represent the LSFs prediction error 
generated by the excitation -processing module 54 within a used to code the LSF component 140 for a frame. The 
respective encoder 36, 38, 40, and 42. In addition the interpolation element 158 may be used to provide a plurality 
bit-allocation designates the number of bits in the bitstream of interpolation paths between the quantized LSFs of the 
that represent a frame. Accordingly, the bit rate varies previous frame and the quantized LSFs of the frame cur- 
depending on the codec 22, 24, 26, and 28 that is activated. 65 rently being processed. In general, the interpolation element 
The bitstream may be classified into a first portion and a 158 represents selectable adjustment in the contour of the 
second portion depending on whether the representative bits line spectrum frequencies (LSFs) during decoding. Select - 



11/18/2003, EAST Version: 1.4.1 



US 6,574,593 Bl 



15 



16 



able adjustment may be used due to the non-stationary 
spectral nature of frames that are classified as Type Zero. For 
frames classified as Type One, the LSF component 140 may 
be encoded using only the stages 156 and a predetermined 
linear interpolation path due to the stationary spectral nature 
of such frames. 

One embodiment of the LSF component 140 includes 2 
bits to encode the interpolation element 158 for frames 
classified as Type Zero. The bits identify the particular 
interpolation path. Each of the interpolation paths adjust the 
weighting of the previous quantized LSFs for each sub frame 
and the weighting of the current quantized LSFs for each 
subframe. Selection of an interpolation path may be deter- 
mined based on the degree of variations in the spectral 
envelope between subsequent subframes. For example, if 
there is substantial variation in the spectral envelope in the 
middle of the frame, the interpolation element 158 selects an 
interpolation path that decreases the influence of the quan- 
tized LSFs from the previous frame. One embodiment of the 
interpolation element 158 can represent any one of four 
different interpolation paths for each subframe. 

The predicted LSFs may be generated using a plurality of 
moving average predictor coefficients. The predictor coef- 
ficients determine how much of the LSFs of past frames are 
used to predict the LSFs of the current frame. The predictor 
coefficients within the full-rate codec 22 use an LSF pre- 
dictor coefficients table. The table may be generally illus- 
trated by the following matrix: 

TABLE 1 




10 



the best representative quantization vectors for each stage 
simultaneously occurs when the candidates have been deter- 
mined for all the stages. The LSF component 140 includes 
index locations of the closest matching quantization vectors 
from each stage. One embodiment of the LSF component 
140 includes 25 bits to encode the index locations within the 
stages 156. The LSF prediction error quantization table for 
the quantization approach may be illustrated generally by 
the following matrix: 

TABLE 2 



15 



30 



In one embodiment, m equals 2 and n equals 10. 
Accordingly, the prediction order is two and there are two 
vectors of predictor coefficients, each comprising 10 ele- 
ments. One embodiment of the LSF predictor coefficients 
table is titled "Float64 B_85k" and is included in Appendix 
B of the attached microfiche appendix. 

Once the predicted LSFs have been determined, the LSFs 
prediction error may be calculated using the actual LSFs. 
The LSFs prediction error may be quantized using a full 
dimensional multi-stage quantizer. An LSF prediction error 
quantization table containing a plurality of quantization 
vectors represents each stage 156 that may be used with the 
multi-stage quantizer. The multistage quantizer determines a 
portion of the LSF component 140 for each stage 156. The 
determination of the portion of the LSF component 140 is 
based on a pruned search approach. The pruned search 
approach determines promising quantization vector candi- 
dates from each stage. At the conclusion of the determina- 
tion of candidates for all the stages, a decision occurs 
simultaneously that selects the best quantization vectors for 
each stage. 

In the first stage, the multistage quantizer determines a 
plurality of candidate first stage quantization errors. The 
candidate first stage quantization errors are the difference 
between the LSFs prediction error and the closest matching 
quantization vectors located in the first stage. The multistage 
quantizer then determines a plurality of candidate second 
stage quantization errors by identifying the quantization 
vectors located in the second stage that best match the 
candidate first stage quantization errors. This iterative pro- 
cess is completed for each of the stages and promising 
candidates are kept from each stage. The final selection of 




One embodiment of the quantization table for both the 
Type Zero and the Type One classification uses four stages 
(j-4) in which each quantization vector is represented by 10 
elements (n=10). The stages 156 of this embodiment include 
25 128 quantization vectors (r=128) for one of the stages 156, 
and 64 quantization vectors (s=64) in the remaining stages 
156. Accordingly, the index location of the quantization 
vectors within the stages 156 may be encoded using 7 bits 
for the one of the stages 156 that includes 128 quantization 
vectors. In addition, index locations for each of the stages 
156 that include 64 quantization vectors may be encoded 
using 6 bits. One embodiment of the LSF prediction error 
quantization table used for both the Type Zero and Type One 
classification is titled "Float64 CBes_85k" and is included 
in Appendix B of the attached microfiche appendix. 
35 Within the decoding system 16, the F0 or Fl LPC 
reconstruction modules 108, 110 in the full-rate decoder 90 
obtain the LSF component 140 from the bitstream as illus- 
trated in FIG. 3. The LSF component 140 may be used to 
reconstruct the quantized LSFs as previously discussed. The 
40 quantized LSFs may be interpolated and converted to form 
the linear prediction coding coefficients for each subframe of 
the current frame. 

For Type Zero classification, reconstruction may be per- 
formed by the F0 LPC reconstruction module 108. Recon- 
45 struction involves determining the predicted LSFs, decoding 
the quantized LSFs prediction error and reconstructing the 
quantized LSFs. In addition, the quantized LSFs may be 
interpolated using the identified interpolation path. As pre- 
viously discussed, one of the four interpolation paths is 
50 identified to the F0 LPC reconstruction module 108 by the 
interpolation element 158 that forms a part of the LSF 
component 140. Reconstruction of the Type One classifica- 
tion involves the use of the predetermined linear interpola- 
tion path and the LSF prediction error quantization table by 
55 the Fl LPC reconstruction module 110. The LSF component 
140 forms part of the first portion of the bitstream since it is 
encoded on a frame basis in both the Type Zero and the Type 
One classifications. 

The type component 142 also forms part of the first 
60 portion of the bitstream. As illustrated in FIG. 2, the F type 
selector module 68 generates the type component 142 to 
represent the type classification of a particular frame. Refer- 
ring now to FIG. 3, the F type selector module 102 in the 
full-rate decoder 90 receives the type component 142 from 
65 the bitstream. 

One embodiment of the adaptive codebook component 
144 may be an open loop adaptive codebook component 
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144a or a closed loop adaptive codebook component 1446. locations where each of the sample locations contains a 

The open or closed loop adaptive codebook component sample value. The tracks of the corresponding representative 

144a, 144b is generated by the initial full frame-processing pulses list only a portion of the sample locations from a 

module 46 or the FO first sub frame -processing module 70, subframe. Each of the representative pulses within one of the 

respectively, as illustrated in FIG. 2. The open loop adaptive s n-pulse codebooks may be represented by one of the pulse 

codebook component 144a may be replaced by the closed locations in the corresponding track, 

loop adaptive codebook component 144b in the bitstream During operation, each of the representative pulses is 

when the frame is classified as Type Zero. In general, the sequentially placed in each of the pulse locations in the 

open loop designation refers to processing on a frame basis corresponding track. The representative pulses are converted 

that does not involve analysis-by-synthesis (ABS). The 10 to a signal that may be compared to the sample values in the 

closed loop processing is performed on a subframe basis and sample locations of the subframe using ABS. The represen- 

includes analysis-by-synthesis (ABS). totive pulses are compared to the sample values in those 

Encoding the pitch lag, which is based on the periodicity sample locations that are later in time than the sample 

of the speech signal 18, generates the adaptive codebook location of the pulse location. The pulse location that 

component 144. The open loop adaptive codebook compo- is minimizes the difference between the representative pulse 

nent 144a is generated for a frame; whereas the closed loop and the sample values that are later in time forms a portion 

adaptive codebook component 144b is generated on a sub- °f the Type Zero fixed codebook component 146a. Each of 

frame basis. Accordingly, the open loop adaptive codebook the representative pulses in a selected n-pulse codebook may 

component 144a is part of the first portion of the bitstream be represented by a corresponding pulse location that forms 

and the closed loop adaptive codebook component 144b is 20 a portion of the Type Zero fixed codebook component 146a. 

part of the second portion of the bitstream. In one The tracks are contained in track tables that can generally be 

embodiment, as Ulustrated in FIG. 4, the open loop adaptive represented by the following matrix: 
codebook component 144a comprises 8 bits and the closed 

loop adaptive codebook component 144b comprises 26 bits. TABLE 3 

The open loop adaptive codebook component 144a and the 25 
closed loop adaptive codebook component 144b may be 
generated using an adaptive codebook vector that will be 
described later. Referring now to FIG. 3, the decoding 
system 16 receives the open or closed loop adaptive code- 
book component 144a or 144b. The open or closed loop 30 
adaptive codebook component 144a or 144b is decoded by 

the FO or Fl excitation reconstruction module 104 or 106, 

respectively. 

One embodiment of the fixed codebook component 146 In one embodiment, the track tables are the tables entitled 

may be a Type Zero fixed codebook component 146a or a 35 "static short track_5_4_0/' "static short track_5_3__2," 

Type One fixed codebook component 146b. The Type Zero and "static short track_5_3_l" within the library titled 

fixed codebook component 146a is generated by the F0 first "tracks.tab" that is included in Appendix B of the attached 

subframe-processing module 70 as illustrated in FIG. 2. The microfiche appendix. 

Fl subframe-processing module 72 generates the Type One In the example embodiment illustrated in FIG. 4, the 

fixed codebook component 146b. The Type Zero or Type 40 n-pulse codebooks are three 5-pulse codebooks 160 where 

One fixed codebook component 146a or 146b is generated the first of the three 5-pulse codebooks 160 includes 5 

using a fixed codebook vector and synthesis-by-analysis on representative pulses therefore n-5. A first representative 

a subframe basis that will be described later. The fixed pulse has a track that includes 16 (f— 16) of the 40 sample 

codebook component 146 represents the long-term residual locations in the subframe. The first representative pulse from 

of a subframe using an n-pulsc codebook, where n is the 45 the first of the three 5-pulse codebooks 160 arc compared 

number of pulses in the codebook. with the sample values in the sample locations. One of the 

Referring now to FIG. 4, the Type Zero fixed codebook sample locations present in the track associated with the first 
component 146a of one embodiment comprises 22 bits per representative pulse is identified as the pulse location using 
subframe. The Type Zero fixed codebook component 146a 4 bits. The sample location that is identified in the track is 
includes identification of one of a plurality of n-pulse 50 the sample location in the subframe that minimizes the 
codebooks, pulse locations in the codebook, and the signs of difference between the first representative pulse and the 
representative pulses (quantity "n") that correspond to the sample values that are later in time as previously discussed, 
pulse locations. In an example embodiment, up to two bits Identification of the pulse location in the track forms a 
designate which one of three n-pulse codebooks has been portion of the Type Zero fixed codebook component 146a. 
encoded. Specifically, the first of the two bits is set to "1" to 55 In this example embodiment, the second and fourth rep- 
designate the first of the three n-pulse codebooks is used. If resentative pulses have corresponding tracks with 16 sample 
the first bit is set to "0," the second of the two bits designates locations (g and i=16) and the third and fifth representative 
whether the second or the third of the three n-pulse code- pulses have corresponding tracks with 8 sample locations (h 
books are used. Accordingly, in the example embodiment, and j«8). Accordingly, the pulse locations for the second and 
the first of the three n-pulse codebooks has 21 bits to 60 fourth representative pulses are identified using 4 bits and 
represent the pulse locations and signs, and the second and the pulse locations of the third and fifth representative pulses 
third of the three n-pulse codebooks have 20 bits available. are identified using 3 bits. As a result, the Type Zero fixed 

Each of the representative pulses within one of the n-pulse codebook component 146a for the first of the three 5-pulse 

codebooks includes a corresponding track. The track is a list codebooks 160 includes 18 bits for identifying the pulse 

of sample locations in a subframe where each sample 65 locations. 

location in the list is one of the pulse locations. A subframe The signs of the representative pulses in the identified 

being encoded may be divided into a plurality of sample pulse locations may also be identified in the Type Zero fixed 
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codebook component 146a. In the example embodiment, fixed codebook energy values of previous frames. The 

one bit represents the sign for the first representative pulse, predicted fixed codebook energy may be derived using a 

one bit represents a combined sign for both the second and plurality of fixed codebook predictor coefficients, 

fourth representative pulses and one bit represents the com- Similar to the LSFs predictor coefficients, the fixed code- 

bined sign for the third and the fifth representative pulses. 5 book predictor coefficients determine how much of the fixed 

The combined sign uses the redundancy of the information codebook energy of past frames may be used to predict the 

in the pulse locations to transmit two distinct signs with a fixed codebook energy of the current frame The predicted 

single bit. Accordingly, the Type Zero fixed codebook com- codebook energy is subtracted from the fixed code- 

ponent 146a for the first of the three 5-pulse codebooks 160 book ??«*}! t0 P ne u rate a prediction fixed codebook energy 

t j «u u*« p *l • j - *• c . * i f "m „ error. By adjusting the weighting of the previous Irames and 

mdudes three bits for the s.gn dcsrgnatron for a total of 21 ,o ^ cu ^ m J frame S s for eac B h sl £ frame> P lh£ pred ;c, e d fixed 

1 f" , , , , ,. . _ , codebook energy may be calculated to minimize the predic- 

In an example embodiment, the second and third of the ^ fixed code5ook error 

three 5-pulse codebooks 160 also include 5 representative The prediction fixed codebook energy error is grouped 
pulses (n-5) and the tracks in the track table each comprise ^ the adaptive gain t0 form a two-dimensional 
8 sample locations (f,g,h,ij-8). Accordingly, the pulse loca- is vector< Following quantization of the prediction fixed code- 
tions for each of the representative pulses in the second and book energy error and mc adaptivc co dc book gain, as later 
third of the three 5-pulse codebook 160 are identified using described, the twondimensional vector may be referred io as 
3 bits. In addition, in this example embodiment, the signs for a quantized ga j n vec tor The two-dimensional vector is 
each of the pulse locations are identified using 1 bit. compared to a plurality of predetermined vectors in a 2D 
For frames classified as Type One, in an example 20 gain quantization table. An index location is identified that 
embodiment, the n-pulse codebook is an 8-pulse codebook ^ ^ location in the 2D gain quantization table of the 
162 (n=8). The 8-pulse codebook 162 is encoded using 30 predetermined vector that best represents the two- 
bits per subframe to create one embodiment of the Type One dimensional vector. The index location is the adaptive and 
fixed codebook component 146b. The 30 bits includes 26 flxed code book gain component 148a and 150a for the 
bits identifying pulse locations using tracks as in the Type 25 su bf ra me. The adaptive and fixed codebook gain component 
Zero classification, and 4 bits identifying the signs. One 14Sa and 150a for the frame represents the indices identified 
embodiment of the track table is the table entitled "static f or cacn Q f mc ^frames. 

INT16 track_8_4_CT within the library titled "tracks.tab" ^ p rede termined vectors comprise 2 elements, one 

that is included in Appendix B of the attached microfiche representing the adaptive codebook gain, and one represent- 

appendix. 30 mg ^ p red j C uon fixed codebook energy error. The 2D gain 

In the example embodiment, the tracks associated with quantization table may be generally represented by: 
the first and fifth representative pulses comprise 16 sample 

locations that are encoded using 4 bits. The tracks associated TABLE 4 

with the remaining representative pulses comprise 8 sample 

locations that are encoded using 3 bits. The first and fifth 35 
representative pulses, the second and sixth representative 
pulses, the third and seventh representative pulses, and the 

fourth and eighth representative pulses use the combined 

signs for both respective representative pulses. As illustrated 

in FTG. 3, when the bitstream is received by the decoding 40 The two-dimensional vector quantizer (2D VQ) 164, of 

system 16, the F0 or the Fl excitation reconstruction mod- one embodiment, utilizes 7 bits per subframe to identify the 

ules 104 or 106 decode the pulse locations of the tracks. The index location of one of 128 quantization vectors (n««128). 

pulse locations of the tracks are decoded by the F0 or the Fl One embodiment of the 2D gain quantization table is 

excitation reconstruction modules 104 or 106 for one of the entitled "Float64 gainVQ_2_128_8_5" and is included in 

three 5-pulse codebooks 160 or the 8-pulse codebook 162, 45 Appendix B of the attached microfiche appendix, 

respectively. The fixed codebook component 146 is part of For frames classified as Type One, a Type One adaptive 

the second portion of the bitstream since it is generated on codebook gain component 1486 is generated by the Fl first 

a subframe basis. frame -processing module 72 as illustrated in FIG. 2. 

Referring again to FIG. 4, the gain component 147, in Similarly, the Fl second frame-processing module 76 gen- 
general, represents the adaptive and fixed codebook gains. 50 erates a Type One fixed codebook gain component 150b. 
For Type Zero classification, the gain component 147 is a The Type One adaptive codebook gain component I486 and 
Type Zero adaptive and fixed codebook gain component the Type One fixed codebook gain component 150b are 
148a, 150a representing both the adaptive and the fixed generated on a frame basis to form part of the first portion 
codebook gains. The Type Zero adaptive and fixed codebook of the bitstream. 

gain component 148a, 150a is part of the second portion of 55 Referring again to FIG. 4, the Type One adaptive code- 
ine bitstream since it is encoded on a subframe basis. As book gain component 148b is generated using a multi- 
illustrated in FIG. 2, the Type Zero adaptive and fixed dimensional vector quantizer that is a four-dimensional pre 
codebook gain component 148a, 150a is generated by the F0 vector quantizer (4D pre VQ) 166 in one embodiment. The 
first subframe-processing module 70. term "pre" is used to highlight that, in one embodiment, the 
For each subframe of a frame classified as Type Zero, the 60 adaptive codebook gains for all the sub frames in a frame are 
adaptive and fixed codebook gains are jointly coded by a quantized prior to the search in the fixed codebook for any 
two-dimensional vector quantizer (2D VQ) 164 to generate of the subframes. In an alternative embodiment, the multi- 
file Type Zero adaptive and fixed codebook gain component dimensional quantizer is an n dimensional vector quantizer 
148a, 150a. In one embodiment, quantization involves that quantizes vectors for n subframes where n may be any 
translating the fixed codebook gain into a fixed codebook 65 number of subframes. 

energy in units of decibels (dB). In addition, a predicted The vector quantized by the four-dimensional pre vector 

fixed codebook energy may be generated from the quantized quantizer (4D pre VQ) 166 is an adaptive codebook gain 
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vector with elements that represent each of the adaptive of Ihe previous frame. The prediction coefficients for the 

codebook gains from each of the subframes. Following first, second, third, and fourth subframes of this example 

quantization, as will be later discussed, the adaptive code- embodiment may be {0.7, 0.6, 0.4, 0.2}, {0.4, 0.2, 0.1, 

book gain vector can also be referred to as a quantized pitch 0.05}, {0.3, 0.2, 0.075, 0.025}, and {0.2, 0.075, 0.025, 0.0}, 

gain (g* fl ). Quantization of the adaptive codebook gain 5 respectively. 

vector to generate the adaptive codebook gain component prc diction fixed codebook energy errors may be 
I486 is performed by searching in a pre-gain quantization grouped to form a fixed codebook gain vector that, when 
table. The pre-gain quantization table includes a plurality of quarjtiz ed, may be referred to as a quantized fixed codebook 
predetermined vectors that may be searched to identify the ia (g * y Ia one embodiment, the prediction fixed code- 
predetermined vectorthat best represents the adaptive code- J(J book crrof fof ^ subframc ^ ^ clcments 
book gam vector The index location of the identified m vectQr ^ diction fixed C J M errors 
predetermined vector within the pre-gain quantization table , . r . „ . , OJ . 
is the TVpe One adaptive codebook component 148b. THe ar * quantized using a plurality of predetermined vectors in 
adaptive codebook gain component 1486 of one embodi- a <M*yed gam quantization table During quantization, a 
ment comprises 6 bits perceptual weighing measure may be incorporated to mmi- 
In one embodiment, the predetermined vectors comprise ^ mize lhe quantization error. An index location that identifies 
4 elements, 1 element for each subframe. Accordingly, the the predetermined vector in the delayed gain quantization 
pre-gain quantization table may he generally represented as: table is the fixed codebook gam component 15Q6 for the 

frame. 

TABLE 5 ^ nc P rc ^ ctermme ^ vectors in the delayed gain quantiza- 

— 20 tion table of one embodiment includes 4 elements. 

Accordingly, the delayed gain quantization table may be 
represented by the previously discussed Table 5. One 
embodiment of the delayed gain quantization table includes 
1024 predetermined vectors (n=1024). An embodiment of 

____ 25 the delayed gain quantization table is entitled "Floal64 
gainVQ_4_1024" and is included in Appendix B of the 

One embodiment of the pre-gain quantization table includes attached microfiche appendix. 

64 predetermined vectors (n«64). An embodiment of the Referring again to FIG. 3, the fixed and adaptive code- 

pre-gain quantization table is entitled "Float64 gp4_tab" book gain components 148 and 150 may be decoded by the 

and is included in Appendix B of the attached microfiche 30 full-rate decoder 90 within the decoding system 16 based on 

appendix. the type classification. The F0 excitation reconstruction 

The Type One fixed codebook gain component 1506 may module 104 decodes the Type Zero adaptive and fixed 

be similarly encoded using a multi-dimensional vector quan- codebook gain component 148a, 150c. Similarly, the Type 

tizer for n subframes. In one embodiment, the multi- One adaptive codebook gain component 1486 and the Type 

dimensional vector quantizer is a four-dimensional delayed 35 One fixed gain component 1506 are decoded by the Fl 

vector quantizer (4D delayed VQ) 168. The term "delayed" excitation reconstruction module 106. 

highlights that the quantization of the fixed codebook gains Decoding of the fixed and adaptive codebook gain com- 

for the subframes occurs only after the search in the fixed ponents 158 and 160 involves generation of the respective 

codebook for all the subframes. Referring again to FIG. 2, predicted gains, as previously discussed, by the full-rate 

the Fl second frame-processing module 76 determines the 40 decoder 90. The respective quantized vectors from the 

fixed codebook gain for each of the subframes. The fixed respective quantization tables are then located using the 

codebook gain may be determined by first buffering param- respective index locations. The respective quantized vectors 

eters generated on a sub-frame basis until the entire frame are then assembled with the respective predicted gains to 

has been processed. When the frame has been processed, the generate respective quantized codebook gains. The quan- 

fixed codebook gains for all of the subframes are quantized 45 tized codebook gains generated from the Type Zero fixed 

using the buffered parameters to generate the Type One fixed and adaptive gain component 148a and 150a represent the 

codebook gain component 1506. In one embodiment, the values for both the fixed and adaptive codebook gains for a 

Type One fixed codebook gain component 1506 comprises subframe. The quantized codebook gain generated from the 

10 bits as illustrated in FIG. 4. Type One adaptive codebook gain component 1486 and the 

The Type One fixed codebook gain component 1506 is 50 Type One fixed codebook gain component 1506 represents 

generated by representing the fixed-codebook gains with a the values for the fixed and adaptive codebook gains, 

plurality of fixed codebook energies in units of decibels respectively, for each subframe in a frame. 

(dB). The fixed codebook energies are quantized to generate 1.2 Bit Allocation for the Half- Rate Codec 

a plurality of quantized fixed codebook energies, which are Referring now to FIGS. 2, 3 and 5, the half-rate bitstream 

then translated to create a plurality of quantized fixed- 55 of the half-rate codec 24 will be described. The half-rate 

codebook gains. In addition, the fixed codebook energies are codec 24 is in many respects similar to the full-rate codec 22 

predicted from the quantized fixed codebook energy errors but has a different bit allocation. As such, for purposes of 

of the previous frames to generate a plurality of predicted brevity, the discussion will focus on the differences. Refer- 

fixed codebook energies. The difference between the pre- ring now to FIG. 5, the bitstream allocation of one embodi- 

dicted fixed codebook energies and the fixed codebook 60 ment of the half-rate codec 24 includes a line spectrum 

energies is a plurality of prediction fixed codebook energy frequency (LSF) component 172, a type component 174, an 

errors. In one embodiment, different prediction coefficients adaptive codebook component 176, a fixed codebook com- 

may be used for each of 4 subframes to generate the ponent 178, and a gain component 179. The gain component 

predicted fixed codebook energies. In this example 179 further comprises an adaptive codebook gain compo- 

embodiment, the predicted fixed codebook energies of the 65 nent 180 and a fixed codebook gain component 182. The 

first, the second, the third, and the fourth subframe are bitstream of the half-rale codec 24 also is further defined by 

predicted from the 4 quantized fixed codebook energy errors a Type Zero column 184 and a Type One column 186. In one 
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embodiment, the Type Zero column 184 uses two subframes 
of 10 milliseconds each containing 80 samples. The Type 
One column 186, of one embodiment, uses three subframes 
where the first and second subframes contain 53 samples and 
the third subframe contains 54 samples. s 

Although generated similarly to the full- rate codec 22, the 
LSF component 172 includes a plurality of stages 188 and 
a predictor switch 190 for both the Type Zero and the Type 
One classifications. In addition, one embodiment of the LSF 
component 172 comprises 21 bits that form part of the first ^ 
portion of the bitstream. The initial half frame-processing 
module 48 illustrated in FIG. 2, generates the LSF compo- 
nent 172 similarly to the full-rate codec 22. Referring again 
to FIG. 5, the half -rate codec 24 of one embodiment includes 
three stages 188, two with 128 vectors and one with 64 15 
vectors. The three stages 188 of the half rate codec 24 
operate similarly to the full-rate codec 22 for frames clas- 
sified as Type One with the exception of the selection of a 
set of predictor coefficients as discussed later. The index 
location of each of the 128 vectors is identified with 7 bits 20 
and the index location of each of the 64 vectors is identified 
with 6 bits. One embodiment of the LSF prediction error 
quantization table for the half-rate codec 24 is titled 
"Float64 CBes__40k" and is included in Appendix B of the 
attached microfiche appendix. 25 

The half-rate codec 24 also differs from the full-rate codec 
22 in selecting between sets of predictor coefficients. The 
predictor switch 190 of one embodiment identifies one of 
two possible sets of predictor coefficients using one bit. The 
selected set of predictor coefficients may be used to deter- 30 
mine the predicted line spectrum frequencies (LSFs), similar 
to the full-rate codec 22. The predictor switch 190 deter- 
mines and identifies which of the sets of predictor coeffi- 
cients will best minimize the quantization error. The sets of 
predictor coefficients may be contained in an LSF predictor 35 
coefficient table that may be generally illustrated by the 
following matrix: 
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In one embodiment there are four predictor coefficients 50 
(m=4) in each of two sets (j«2) that comprise 10 elements 
each (n-10). The LSF predictor coefficient table for the 
half- rate codec 24 in one embodiment is tilled "Float64 
B_40k" and is included in Appendix B of the attached 
microfiche appendix. Referring again to FIG. 3, the LSF 55 
prediction error quantization table and the LSF predictor 
coefficient table are used by the H LPC reconstruction 
module 118 within the decoding system 16. The H LPC 
reconstruction module 118 receives and decodes the LSF 
component 172 from the bitstream to reconstruct the quan- 60 
tized frame LSFs. Similar to the full-rate codec 22, for 
frames classified as Type One, the half- rate codec 24 uses a 
predetermined linear interpolation path. However, the half- 
rate codec 24 uses the predetermined linear interpolation 
path for frames classified as both Type Zero and Type One. 65 

The adaptive codebook component 176 in the half-rate 
codec 24 similarly models the pitch lag based on the 



periodicity of the speech signal 18. The adaptive codebook 
component 176 is encoded on a subframe basis for the Type 
Zero classification and a frame basis for the Type One 
classification. As illustrated in FIG. 2, the initial half frame- 
processing module 48 encodes an open loop adaptive code- 
book component 176a for frames with the Type One clas- 
sification. For frames with the Type Zero classification, the 
HO first subframe -processing module 80 encodes a closed 
loop adaptive codebook component 176b. 

Referring again to FIG. 5, one embodiment of the open 
loop adaptive codebook component 176a is encoded by 7 
bits per frame and the closed loop adaptive codebook 
component 1766 is encoded by 7 bits per subframe. 
Accordingly, the Type Zero adaptive codebook component 
176a is part of the first portion of the bitstream, and the Type 
One adaptive codebook component 1766 is part of the 
second portion of the bitstream. As illustrated in FIG, 3, the 
decoding system 16 receives the closed loop adaptive code- 
book component 1766. The closed loop adaptive codebook 
component 1766 is decoded by the half-rate decoder 92 
using the HO excitation reconstruction module 114. 
Similarly, the HI excitation reconstruction module 116 
decodes the open loop adaptive codebook component 176a. 

One embodiment of the fixed codebook component 178 
for the half -rate codec 24 is dependent on the type classi- 
fication to encode the long-term residual as in the full-rate 
codec 22. Referring again to FIG. 2, a Type Zero fixed 
codebook component 178a or a Type One fixed codebook 
component 1786 is generated by the HO first subframe - 
processing module 80 or the HI second subframe- 
processing module 84, respectively. Accordingly, the Type 
Zero and Type One fixed codebook components 178a and 
1786 form a part of the second portion of the bitstream. 

Referring again to FIG. 5, the Type Zero fixed codebook 
component 178a of an example embodiment is encoded 
using 15 bits per subframe with up to two bits identify the 
codebook to be used as in the full-rate codec 22. Encoding 
the Type Zero fixed codebook component 178a involves use 
of a plurality of n-pulse codebooks that are a 2-pulse 
codebook 192 and a 3-pulse codebook 194 in the example 
embodiment. In addition, in this example embodiment, a 
Gaussian codebook 195 is used that includes entries that are 
random excitation. For the n-pulse codebooks, the half-rate 
codec 24 uses the track tables similarly to the full-rate codec 
22. In one embodiment, the track table entitled "static INT1 6 
track__2_7_l, ,f "static INT16 track_l_3_0," and "static 
INT16 track_3_2_0" included in the library entitled 
"tracks.tab" in Appendix B of the microfiche appendix are 
used. 

In an example embodiment of the 2-pulse codebook 192, 
each track in the track table includes 80 sample locations for 
each representative pulse. The pulse locations for both the 
first and second representative pulses are encoded using 13 
bits. Encoding 1 of the 80 possible pulse locations is 
accomplished in 13 bits by identifying the pulse location for 
the first representative pulse, multiplying the pulse location 
by 80 and adding the pulse location of the second represen- 
tative pulse to the result. The end result is a value that can 
be encoded in 13 bits with an additional bit used to represent 
the signs of both representative pulses as in the full-rate 
codec 22. 

In an example embodiment of the 3-pulse codebook 194, 
the pulse locations are generated by the combination of a 
general location, that may be one of 16 sample locations 
defined by 4 bits, and a relative displacement there from. 
The relative displacement may be 3 values representing each 
of the 3 representative pulses in the 3-pulse codebook 194. 
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The values represent the location difference away from the 
general location and may be defined by 2 bits for each 
representative pulse. The signs for the three representative 
pulses may be each defined by one bit such that the total bits 
for the pulse location and the signs is 13 bits. 

The Gaussian codebook 195 generally represents noise 
type speech signals that may be encoded using two orthogo- 
nal basis random vectors. The Type Zero fixed codebook 
component 178a represents the two orthogonal based ran- 
dom vectors generated from the Gaussian codebook 195. 
The Type Zero fixed codebook component 178a represents 
how to perturbate a plurality of orthogonal basis random 
vectors in a Gaussian table to increase the number of 
orthogonal basis random vectors without increasing the 
storage requirements. In an example embodiment, the num- 
ber of orthogonal basis random vectors is increased from 32 
vectors to 45 vectors. A Gaussian table that includes 32 
vectors with each vector comprising 40 elements represents 
the Gaussian codebook of the example embodiment. In this 
example embodiment, the two orthogonal basis random 
vectors used for encoding are interleaved with each other to 
represent 80 samples in each subframe. The Gaussian code- 
book may be generally represented by the following matrix: 

TABLE 7 
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One embodiment of the Gaussian codebook 195 is titled 
"double bv" and is included in Appendix B of the attached 
microfiche appendix. For the example embodiment of the 
Gaussian codebook 195, 11 bits identify the combined 35 
indices (location and perturbation) of both of the two 
orthogonal basis random vectors used for encoding, and 2 
bits define the signs of the orthogonal basis random vectors. 

Encoding the Type One fixed codebook component 178b 
involves use of a plurality of n-pulse codebooks that are a 40 

2- pulse codebook 196 and a 3-pulse codebook 197 in the 
example embodiment. The 2-pulse codebook 196 and the 

3- pulse codebook 197 function similarly to the 2-pulse 
codebook 192 and the 3-pulse codebook 194 of the Type 
Zero classification, however the structure is different. The 45 
Type One fixed codebook component 1786 of an example 
embodiment is encoded using 13 bits per subframe. Of the 

13 bits, 1 bit identifies the 2-pulse codebook 196 or the 
3-pulse codebook 197 and 12 bits represent the respective 
pulse locations and the signs of the representative pulses. In 50 
the 2-pulse codebook 196 of the example embodiment, the 
tracks include 32 sample locations for each representative 
pulse that are encoded using 5 bits with the remaining 2 bits 
used for the sign of each representative pulse. In the 3-pulse 
codebook 197, the general location includes 8 sample loca- 55 
tions that are encoded using 4 bits. The relative displacement 
is encoded by 2 bits and the signs for the representative 
pulses are encoded in 3 bits similar to the frames classified 
as Type Zero. 

Referring again to FIG. 3, the decoding system 16 60 
receives the Type Zero or Type One fixed codebook com- 
ponents 178a and 1786. The Type Zero or Type One fixed 
codebook components 178a and 1786 are decoded by the HO 
excitation reconstruction module 114 or the HI reconstruc- 
tion module 116, respectively. Decoding of the Type Zero 65 
fixed codebook component 178a occurs using an embodi- 
ment of the 2-pulse codebook 192, the 3-pulse codebook 



194, or the Gaussian codebook 195. The Type One fixed 
codebook component 1786 is decoded using the 2-pulse 
codebook 196 or the 3-pulsc codebook 197. 

Referring again to FIG, 5, one embodiment of the gain 
component 179 comprises a Type Zero adaptive and fixed 
codebook gain component 180a and 182a, The Type Zero 
adaptive and fixed codebook gain component 180a and 182a 
may be quantized using the two-dimensional vector quan- 
tizer (2D VQ) 164 and the 2D gain quantization table (Table 
4), used for the full-rate codec 22. In one embodiment, the 
2D gain quantization table is entitled "Float64 gainVQ_3_ 
128", and is included in Appendix B of the attached micro- 
fiche appendix. 

Type One adaptive and fixed codebook gain components 
1806 and 1826 may also be generated similarly to the 
full-rate codec 22 using multi-dimensional vector quantiz- 
ers. In one embodiment, a three-dimensional pre vector 
quantizer (3D preVQ) 198 and a three-dimensional delayed 
vector quantizer (3D delayed VQ) 200 are used for the 
adaptive and fixed gain components 1806 and 1826, respec- 
tively. The vector quantizers 198 and 200 perform quanti- 
zation using respective gain quantization tables. In one 
embodiment, the gain quantization tables are a pre-gain 
quantization table and a delayed gain quantization table for 
the adaptive and fixed codebook gains, respectively. The 
multi-dimensional gain tables may be similarly structured 
and include a plurality of predetermined vectors. Each 
multi-dimensional gain table in one embodiment comprises 
3 elements for each subframe of a frame classified as Type 
One. 

Similar to the full-rate codec 22, the three-dimensional 
pre vector quantizer (3D preVQ) 198 for the adaptive gain 
component 1806 may quantize directly the adaptive gains. 
In addition, the three-dimensional delayed vector quantizer 
(3D delayed VQ) 200 for the fixed gain component 1826 
may quantize the fixed codebook energy prediction error. 
Different prediction coefficients may be used to predict the 
fixed codebook energy for each subframe. In one preferred 
embodiment, the predicted fixed codebook energies of the 
first, the second, and the third subframes are predicted from 
the 3 quantized fixed codebook energy errors of the previous 
frame. In this example embodiment, the predicted fixed 
codebook energies of the first, the second, and the third 
subframes are predicted using the set of coefficients {0.6, 
0.3, 0.1}, {0.4, 0.25, 0.1}, and {0,3, 0.15, 0.075}, respec- 
tively. 

The gain quantization tables for the half-rate codec 24 
may be generally represented as: 

TABLE 8 




One embodiment of the pre-gain quantization table used 
by the three-dimensional pre vector quantizer (3D preVQ) 
198 includes 16 vectors (n=16). The three-dimensional 
delayed vector quantizer (3D delayed VQ) 200 uses one 
embodiment of the delayed gain quantization table that 
includes 256 vectors (n«256). The gain quantization tables 
for the pre vector quantizer (3D preVQ) 198 and the delayed 
vector quantizer (3D delayed VQ) 200 of one embodiment 
are entitled "Float64 gp3_tab" and "Float64 gainVQ_3_ 
256", respectively, and are included in Appendix B of the 
attached microfiche appendix. 

Referring again to FIG. 2, the Type Zero adaptive and 
fixed codebook gain component 180a and 182a is generated 
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by the HO first subframe-processing module 80. The HI first 

frame-processing module 82 generates the Type One adap- TABLE 9 

tive codebook gain component 180b. Similarly, the Type ~ " ^— 
One fixed codebook gain component 182b is generated by 

the HI second frame-processing module 86. Referring again 5 
to FIG. 3, the decoding system 16 receives the Type Zero 

adaptive and fixed codebook gain component 180a and ™ — 

182a. The Type Zero adaptive and fixed codebook gain In one embodiment, the energy gain quantization table 

component 180a and 182a is decoded by the HO excitation 10 contains 64 (n=64) of the predetermined scalars. An embodi- 

reconstmction module 114 based on the type classification. ™ n{ ° f the «? c *By S ain quantization table is entitled 

Similarly, the HI excitation reconstruction module 116 ^^|^^^ ° f 

decodes the Type One adaptive gam component 180b and In FIG 2j the LSF component 202 is encoded on a frame 

the Type One fixed codebook gain component 182b. basis by the initial quarter frame-processing module 50. 

1.3 Bit Allocation for the Quarter-Rate Codec 15 Similarly, the energy component 204 is encoded by the 

n c . t . . . - . _ ^ quarter rate module 60 on a sub frame basis. Referring now 

Referring now to FIGS. 2, 3 and 6, the quarter-rate tQ nQ ^ ^ d£coding ^ ^ ^ ^ com . 

bitstream of the quarter-rate codec 26 will now be explained. ponent 2 02. The LSF component 202 is decoded by the Q 

The illustrated embodiment of the quarter-rate codec 26 LPC reconstruction module 122 and the energy component 

operates on both a frame basis and a subframe basis but does 20 204 is decoded by the Q excitation reconstruction module 

not include the type classification as part of the encoding 120 Decoding the LSF component 202 is similar to the 

process as in the full and half-rate codecs 22 and 24. ^coding methods for the full-rate codec 22 for frames 

n c ■ * pr/^ , . . t . t ^ classified as Type One. The energy component 204 is 

Referring now to FIG. 6, the bitstream generated by quarter- decoded tQ 6ct J^M the energy gain A vector of similar yet 

rate codec 26 includes an LSF component 202 and an energy ^ random numbers generated within the decoding system 16 

component 204. One embodiment of the quarter-rate codec may be multiplied by the energy gain to generate the 

26 operates using two subframes of 10 milliseconds each to short-term excitation. 

process frames using 39 bits per frame. 1-4 Bit Allocation for the Eighth-Rate Codec 

__ T „ r , ( ™ . . , c , . In FIGS. 2, 3, and 7, the eighth- rate bitstream of the 

THe LSF component 202 is encoded on a frame basis eighlh . rate ^ 28 may not { J ud& the type classification 

using a similar LSF quantization scheme as the full-rate 30 as part of thc encoding procc ^ and mav operatc on a frame 

codec 22 when the frame is classified as Type Zero. The basis only. Referring now to FIG, 7, similar to the quarter 

quarter-rate codec 26 utilizes an interpolation element 206 rate codec 26, the bitstream of the eighth-rate codec 28 

and a plurality of stages 208 to encode the LSFs to represent includes an LSF component 240 and an energy component 

the spectral envelope of a frame. One embodiment of the „ U2 :^^ component 240 may be encoded using a 

t cc ^ . ™ • „ , , , - „ n ~ ... 0 - ... 35 similar LSF quantization scheme as the full-rate codec 22, 

LSF component 202 is encoded using 27 bits. The 27 bus when ^ ^ . g ^ Type Qne ^ eighlh . rate 

represent the interpolation element 206 that is encoded in 2 codec 28 milizes a phirality of stages 2 44 to encode the 

bits and four of the stages 208 that are encoded in 25 bits. short-term predictor or spectral representation of a frame. 

The stages 208 include one stage encoded using 7 bits and One embodiment of the LSF component 240 is encoded 

three stages encoded using 6 bits. In one embodiment, the *o using 11 bits per frame in three stages 244. Two of the three 

quarter rate codec 26 uses the exact quantization table and stages 244 are encoded in 4 bits and the last of the three 

predictor coefficients table used by the full rated codec 22. sta 8 es 244 is encoded in 3 bits. 

The quantization table and the predictor coefficients table of The quantizaUon approach to generate the LSF compo- 

M , t , t£A ™ oc . „ , nent 240 for the eighth-rate codec 28 involves an LSF 

one embodiment are titled ( Floato4 CBes_85k and , ° t . # . . , , , t „ «- 

. 45 prediction error quantization table and a predictor coem- 

"Float64 B_85k", respectively, and are included in Appen- dems lable similaf tQ tfae full . rate codec 22 The ^ 

dix B of the attached microfiche appendix. prediction error quantization table and the LSF predictor 

The energy component 204 represents an energy gain that coefficients table can be generally represented by the pre- 

may be multiplied by a vector of similar yet random num- viously discussed Tables 1 and 2. In an example 

bcrs that may be generated by both the encoding system 12 50 embodiment, the LSF quantization table for the eighth-rate 

and the decoding system 16. In one embodiment, the energy codec 28 includes 3 stages Q-3) with 16 quantization 

component 204 is encoded using 6 bits per subframe. Hie vectore in two stages (r-1 6) and 8 quantization vectors in 

r t ~ nA . 7 j u r . j . • • .u one stage (so8) each having 10 elements (n-10). The pre- 

energy component 204 is generated by first determining the ^ ^J n{ ^ of ^ embodjmen [ mclu ; des 4 ^ 

energy gain for the subframe based on the random numbers. Js (ors (mo4) of 1Q dcincnte each (n-10)< ^ quantization 

In addition, a predicted energy gain is determined for the lable and the predictor coefficients table of one embodiment 

subframe based on the energy gain of past frames. are tiUeci "Float64 CBes_08k" and "Float64 B_08k " 

The predicted energy gain is subtracted from the energy respectively, and are included in Appendix B of the attached 

gain to determine an energy gain prediction error. The microfiche appendix. 

energy gain prediction error is quantized using an energy 60 In FIG. 2, the LSF component 240 is encoded on a frame 

gain quantizer and a plurality of predetermined scalars in an basis b * lhe miii * 1 th t frame-processing module 52. The 
. . ' f . . . - energy component 242 also is encoded on a frame basis by 

energy gain quantization table Index locations of the pre- ^ % module 62 ^ component 242 

determined scalars for each subframe may be represented by represeDts an energy gain that can be determined and coded 

the energy component 204 for the frame. 6S s j m i]arly to the quarter rate codec 26. One embodiment of 

The energy gain quantization table may be generally the energy component 242 is represent by 5 bits per frame 

represented by the following matrix: as illustrated in FIG. 7. 
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Similar to the quarter rate codec 26, the energy gain and 
the predicted energy gain may be used to determine an 
energy prediction error. The energy prediction error is quan- 
tized using an energy gain quantizer and a plurality of 
predetermined scalars in an energy gain quantization table. 
The energy gain quantization table may be generally repre- 
sented by Table 9 as previously discussed. The energy gain 
quantizer of one embodiment uses an energy gain quanti- 
zation table containing 32 vectors (n«32) that is entitled 
"Float64 gainSQ_l __32" and is included in Appendix B of 
the attached microfiche appendix. 

In FIG. 3, the LSF component 240 and the energy 
component 242 may be decoded following receipt by the 
decoding system 16. The LSF component 240 and the 
energy component 242 are decoded by the E LPC recon- 
struction module 126 and the E excitation reconstruction 
module 124, respectively. Decoding of the LSF component 
240 is similar to the full-rate codec 22 for frames classified 
as Type One. The energy component 242 may be decoded by 
applying the decoded energy gain to a vector of similar yet 
random numbers as in the quarter rate codec 26. 

An embodiment of the speech compression system 10 is 
capable of creating and then decoding a bitstream using one 
of the four codecs 22, 24, 26 and 28. The bitstream generated 
by a particular codec 22, 24, 26 and 28 may be encoded 
emphasizing different parameters of the speech signal 18 
within a frame depending on the rate selection and the type 
classification. Accordingly, perceptual quality of the post- 
processed synthesized speech 20 decoded from the bitstream 
may be optimized while maintaining the desired average bit 
rate. 

A detailed discussion of the configuration and operation 
of the speech compression system modules illustrated in the 
embodiments of FIGS. 2 and 3 is now provided. The reader 
is encouraged to review the source code included in Appen- 
dix A of the attached microfiche appendix in conjunction 
with the discussion to further enhance understanding. 
2.0 Pre-Processing Module 

Referring now to FIG. 8, an expanded block diagram of 
the pre-processing module 34 illustrated in FIG. 2 is pro- 
vided. One embodiment of the pre-processing module 34 
includes a silence enhancement module 302, a high-pass 
filter module 304, and a noise suppression module 306. The 
pre-processing module 34 receives the speech signal 18 and 
provides a pre-processed speech signal 308. 

The silence enhancement module 302 receives the speech 
signal 18 and functions to track the minimum noise resolu- 
tion. The silence enhancement function adaplively tracks the 
minimum resolution and levels of the speech signal 18 
around zero, and detects whether the current frame may be 
"silence noise." If a frame of "silence noise" is detected, the 
speech signal 18 may be ramped to the zero-level. 
Otherwise, the speech signal 18 may not be modified. For 
example, the A-law coding scheme can transform such an 
inaudible "silence noise" into a clearly audible noise. A-law 
encoding and decoding of the speech signal 18 prior to the 
pre-processing module 34 can amplify sample values that 
are nearly 0 to values of about +8 or -8 thereby transforming 
a nearly inaudible noise into an audible noise. After pro- 
cessing by the silence enhancement module 302, the speech 
signal 18 may be provided to the high -pass filter module 
304. 

The high -pass filter module 304 may be a 2 nd order 
pole-zero filter, and may be given by the following transfer 
function H(z): 
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0.92727435- (Equation 1) 

1.8544941Z- 1 + 0.92727435Z' 2 
H{Z) * 1 - 1.9059465*-* +0.91 14024r J 

S 

The input may be scaled down by a factor of 2 during the 
high-pass filtering by dividing the coefficients of the 
numerator by 2. 
Q Following processing by the high-pass filter, the speech 
signal 18 may be passed to the noise suppression module 
306. The noise suppression module 306 employs noise 
subtraction in the frequency domain and may be one of the 
many well-known techniques for suppressing noise. The 
noise suppression module 306 may include a Fourier trans- 
15 form program used by a noise suppression algorithm as 
described in section 4.1.2 of the TTA/EIA IS-127 standard 
entitled "Enhanced Variable Rate Codec, Speech Service 
Option 3 for Wideband Spread Spectrum Digital Systems." 
The noise suppression module 306 of one embodiment 
20 transforms each frame of the speech signal 18 to the fre- 
quency domain where the spectral amplitudes may be sepa- 
rated from the spectral phases. The spectral amplitudes may 
be grouped into bands, which follow the human auditory 
channel bands. An attenuation gain may be calculated for 
25 each band. The attenuation gains may be calculated with less 
emphasis on the spectral regions that are likely to have 
harmonic structure. In such regions, the background noise 
may be masked by the strong voiced speech. Accordingly, 
any attenuation of the speech can distort the quality of the 
3Q original speech, without any perceptual improvement in the 
reduction of the noise. 

Following calculation of the attenuation gain, the spectral 
amplitudes in each band may be multiplied by the attenua- 
tion gain. The spectral amplitudes may then be combined 
with the original spectral phases, and the speech signal 18 
35 may be transformed back to the time domain. The time- 
domain signal may be overlapped-and-added to generate the 
pre-processed speech signal 308. The pre-processed speech 
signal 308 may be provided to the initial frame-processing 
module 44. 
40 3.0 Initial Frame Processing Module 

FIG. 9 is a block diagram of the initial frame-processing 
module 44, illustrated in FIG. 2. One embodiment of the 
initial frame -processing module 44 includes an LSF genera- 
lion section 312, a perceptual weighting filler module 314, 
45 an open loop pitch estimation module 316, a characteriza- 
tion section 318, a rate selection module 320, a pilch 
pre-processing module 322, and a type classification module 
324. The characterization section 318 further comprises a 
voice activity detection (VAD) module 326 and a charac- 
50 terization module 328. The LSF generation section 312 
comprises an LPC analysis module 330, an LSF smoothing 
module 332, and an LSF quantization module 334. In 
addition, within the full-rate encoder 36, the LSF generation 
section 312 includes an interpolation module 338 and within 
5S the half-rate encoder 38, the LSF generation section includes 
a predictor switch module 336, 

Referring to FIG. 2, the initial frame-processing module 
44 operates to generate the LSF components 140, 172, 202 
and 240, as well as determine the rate selection and the type 
60 classification. The rate selection and type classification 
control the processing by the excitation -processing module 
54. The initial frame-processing module 44 illustrated in 
FIG. 9 is illustrative of one embodiment of the initial full 
frame-processing module 46 and the initial half frame- 
65 processing module 48. Embodiments of the initial quarter 
frame-processing module 50 and the initial eighth frame- 
processing module 52 differ to some degree. 
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As previously discussed, in one embodiment, type clas- 
sification does not occur for the initial quarter-rate frame- 
processing module 50 and the initial eighth- rate frame - 
processing module 52. In addition, the long-term predictor 
and the long-term predictor residual are not processed 
separately to represent the energy component 204 and 242 
illustrated in FIGS. 6 and 7. Accordingly, only the LSF 
section 312, the characterization section 318 and the rate 
selection module 320 illustrated in FIG. 9 are operable 
within the initial quarter-rate frame-processing module 50 
and the initial eighth-rate frame-processing module 52. 

To facilitate understanding of the initial frame -processing 
module 44, a general overview of the operation will first be 
discussed followed by a detailed discussion. Referring now 
to FIG. 9, the pre-processed speech signal 308 initially is 
provided to the LSF generation section 312, the perceptual 
weighting filter module 314 and the characterization section 
318. However, some of the processing within the character- 
ization section 318 is dependent on the processing that 
occurs within the open loop pitch estimation module 316. 
The LSF generation section 312 estimates and encodes the 
spectral representation of the pre-processed speech signal 
308. The perceptual weighting filter module 314 operates to 
provide perceptual weighting during coding of the pre- 
processed speech signal 308 according to the natural mask- 
ing that occurs during processing by the human auditory 
system. The open loop pitch estimation module 316 deter- 
mines the open loop pitch lag for each frame. The charac- 
terization section 318 analyzes the frame of the pre- 
processed speech signal 308 and characterizes the frame to 
optimize subsequent processing. 

During, and following, the processing by the character- 
ization section 318, the resulting characterizations of the 
frame may be used by the pitch pre-processing module 322 
to generate parameters used in generation of the closed loop 
pitch lag. In addition, the characterization of the frame is 
used by the rate selection module 320 to determine the rate 
selection. Based on parameters of the pitch lag determined 
by the pitch pre-processing module 322 and the 
characterizations, the type classification is determined by the 
type classification module 324. 
3.1 LPC Analysis Module 

The pre-processed speech signal 308 is received by the 
LPC analysis module 330 within the LSF generation section 
312. The LPC analysis module 330 determines the short- 
term prediction parameters used to generate the LSF com- 
ponent 312. Within one embodiment of the LPC analysis 
module 330, there are three 10"* order LPC analyses per- 
formed for a frame of the pre-processed speech signal 308. 
The analyses may be centered within the second quarter of 
the frame, the fourth quarter of the frame, and a lookahead. 
The lookahead is a speech segment that overhangs into the 
next frame to reduce transitional effects. The analysis within 
the lookahead includes samples from the current frame and 
from the next frame of the pre-processed speech signal 308. 

Different windows may be used for each LPC analysis 
within a frame to calculate the linear prediction coefficients. 
The LPC analyses in one embodiment are performed using 
the autocorrelation method to calculate autocorrelation coef- 
ficients. The autocorrelation coefficients may be calculated 
from a plurality of data samples within each window. During 
the LPC analysis, bandwidth expansion of 60 Hz and a white 
noise correction factor of 1.0001 may be applied to the 
autocorrelation coefficients. The bandwidth expansion pro- 
vides additional robustness against signal and round-off 
errors during subsequent encoding. The white noise correc- 
tion factor effectively adds a noise floor of -40 dB to reduce 
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the spectral dynamic range and further mitigate errors during 
subsequent encoding. 

A plurality of reflection coefficients may be calculated 
using a Leroux-Gueguen algorithm from the autocorrelation 

5 coefficients. The reflection coefficients may then be con- 
verted to the linear prediction coefficients. The linear pre- 
diction coefficients may be further converted to the LSFs 
(Line Spectrum Frequencies), as previously discussed. The 
LSFs calculated within the fourth quarter may be quantized 

io and sent to the decoding system 16 as the LSF component 
140, 172, 202, 240. The LSFs calculated within the second 
quarter may be used to determine the interpolation path for 
the full-rate encoder 36 for frames classified as Type Zero. 
The interpolation path is selectable and may be identified 

15 with the interpolation element 158. In addition, the LSFs 
calculated within the second quarter and the lookahead may 
be used in the encoding system 12 to generate the short-term 
residual and a weighted speech that will be described later. 

3.2 LSF Smoothing Module 

20 During stationary background noise, the LSFs calculated 
within the fourth quarter of the frame may be smoothed by 
the LSF smoothing module 332 prior to quantizing the LSFs. 
The LSFs are smoothed to better preserve the perceptual 
characteristic of the background noise. The smoothing is 

25 controlled by a voice activity determination provided by the 
VAD module 326 that will be later described and an analysis 
of the evolution of the spectral representation of the frame. 
An LSF smoothing factor is denoted (3,^ In an example 
embodiment: 

30 1. At the beginning of "smooth" background noise 
segments, the smoothing factor may be ramped qua- 
dratically from 0 to 0.9 over 5 frames. 

2. During "smooth" background noise segments the 
35 smoothing factor may be 0.9. 

3. At the end of "smooth" background noise segments the 
smoothing factor may be reduced to 0 instantaneously. 

4. During non-"smooth" background noise segments the 
smoothing factor may be 0. 

40 According to the LSF smoothing factor the LSFs for the 
quantization may be calculated as: 

W&WtfW*-&W-WVzV). 10 (Equation 2) 

45 where lsf„(k) and lsf^^k) represents the smoothed LSFs of 
the current and previous frame, respectively, and lsfjfk) 
represents the LSFs of the LPC analysis centered al the last 
quarter of the current frame, 

3.3 LSF Quantization Module 

50 The 10* order LPC model given by the smoothed LSFs 
(Equation 2) may be quantized in the LSF domain by the 
LSF quantization module 334. The quantized value is a 
plurality of quantized LPC coefficients Aq(z) 342. The 
quantization scheme uses an n' A order moving average 

55 predictor. In one embodiment, the quantization scheme uses 
a 2 nd order moving average predictor for the full-rate codec 
22 and the quarter rate codec 26. For the half-rate codec 24, 
a 4 th order moving average switched predictor may be used. 
For the eighth rate codec 28, a 4 th order moving average 

60 predictor may be used. The quantization of the LSF predic- 
tion error may be performed by multi-stage codebooks, in 
the respective codecs as previously discussed. 

The error criterion for the LSFs quantization is a weighted 
mean squared error measure. The weighting for the weighted 

65 mean square error is a function of the LPC magnitude 
spectrum. Accordingly, the objective of the quantization 
may be given by: 
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WM.B/M) a/„(10)) = 



(Equation 3) 



irgmio|2 H) • [lsf„{k) - B/„(*)) ! J. 
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be performed by emphasizing the valley areas and 
de-emphasizing the peak areas of the pre-processed speech 
signal 308. One embodiment of the perceptual weighting 
filter module 314 has two parts. The first part may be the 
traditional pole -zero filter given by: 



where the weighting may be: 

hvH/^/^OJP 4 , (Equation 4) 

and |P(f)| is the LPC power spectrum at frequency f (the 
index n denotes the frame number). In the example 
embodiment, there are 10 coefficients. 

In one embodiment, the ordering property of the quan- 
tized LPC coefficients A^(z) 342 is checked. If one LSF pair 
is flipped they may be re-ordered. When two or more LSF 
pairs are flipped, the quantized LPC coefficients A (z) 342 
may be declared erased and may be reconstructed using the 
frame erasure concealment of the decoding system 16 that 
will be discussed later In one embodiment, a minimum 
spacing of 50 Hz between adjacent coefficients of the 
quantized LPC coefficients A^(z) 342 may be enforced. 

3.4 Predictor Switch Module 

The predictor switch module 336 is operable within the 
half -rate codec 24. The predicted LSFs may be generated 
using moving average predictor coefficients as previously 
discussed. The predictor coefficients determine how much of 
the LSFs of past frames are used to predict the LSFs of the 
current frame. The predictor switch module 336 is coupled 
with the LSFs quantization module 334 to provide the 
predictor coefficients that minimize the quantization error as 
previously discussed. 

3.5 LSF Interpolation Module 

The quantized and unquantized LSFs may also be inter- 
polated for each subframe within the full- rate codec 22. The 
quantized and unquantized LSFs are interpolated to provide 
quantized and unquantized linear prediction parameters for 
each subframe. The LSF interpolation module 338 chooses 
an interpolation path for frames of the full-rate codec 22 
with the Type Zero classification, as previously discussed. 
For all other frames, a predetermined linear interpolation 
path may be used. 

The LSF interpolation module 338 analyzes the LSFs of 
the current frame with respect to the LSFs of previous 
frames and the LSFs that were calculated at the second 
quarter of the frame. An interpolation path may be chosen 
based on the degree of variations in the spectral envelope 
between the subframes. The different interpolation paths 
adjust the weighting of the LSFs of the previous frame and 
the weighting of the LSFs of the current frame for the 
current subframe as previously discussed. Following adjust - 
ment by the LSF interpolation module 338, the interpolated 
LSFs may be converted to predictor coefficients for each 
subframe. 

For Type One classification within the full-rate codec 22, 
as well as for the half-rate codec 24, the quarter-rate codec 
26, and the eighth-rate codec 28, the predetermined linear 
interpolation path may be used to adjust the weighting. The 
interpolated LSFs may be similarly converted to predictor 
coefficients following interpolation. In addition, the predic- 
tor coefficients may be further weighted to create the coef- 
ficients that are used by perceptual weighting filter module 
314. 

3.6 Perceptual Weighting Filter Module 

The perceptual weighting filter module 314 is operable to 
receive and filter the pre-processed speech signal 308. 
Filtering by the perceptual weighting filter module 314 may 



w,(z) = 



Mz/yO 



(Equation 5) 



10 where A(zyy x ) ano * lA^z/Yz) are a zeros-filter and a poles- 
filter, respectively. The prediction coefficients for the zeros- 
filter and the poles-filter may be obtained from the interpo- 
lated LSFs for each subframe and weighted by y x and y 2) 
respectively. In an example embodiment of the perceptual 

15 weighting filter module 314, the weighting is y^0.9 and 
Y 2 -0.5. The second part of the perceptual weighting filter 
module 314 may be an adaptive law-pass filter given by; 
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(Equation 6) 



where T| is a function of stationary long-term spectral 
characteristics that will be later discussed. In one 
25 embodiment, if the stationary long-term spectral character- 
istics have the typical tilt associated with public switched 
telephone network (PSTN), then T|=0.2, otherwise, T|=0.0. 
The typical tilt is commonly referred to as a modified IRS 
characteristic or spectral tilt. Following processing by the 
perceptual weighting filter module 314, the pre-processed 
speech signal 308 may be described as a weighted speech 
344. The weighted speech 344 is provided to the open loop 
pitch estimation module 316. 
3.7 Open Loop Pitch Estimation Module 
35 The open loop pitch estimation module 316 generates the 
open loop pitch lag for a frame. In one embodiment, the 
open loop pitch lag actually comprises three open loop pitch 
lags, namely, a first pitch lag for the first half of the frame, 
a second pitch lag for the second half of the frame, and a 
third pitch lag for the lookahead portion of the frame. 

For every frame, the second and third pitch lags are 
estimated by the open loop pitch estimation module 316 
based on the current frame. The first open loop pitch lag is 
the third open loop pitch lag (the lookahead) from the 
previous frame that may be further adjusted. The three open 
loop pitch lags are smoothed to provide a continuous pitch 
contour. The smoothing of the open loop pitch lags employs 
a set of heuristic and ad-hoc decision rules 10 preserve the 
optimal pitch contour of the frame. The open-loop pitch 
estimation is based on the weighted speech 344 denoted by 
s^n). The values estimated by the open loop pitch estima- 
tion module 316 in one embodiment are lags that range from 
17 to 148. 

The first, second and third open loop pitch lags may be 
determined using a normalized correlation, R(k) that may be 



calculated according to 



£j*(n)-j„,(n-A) 



(Equation 7) 



Where n=79 in the example embodiment to represent the 
65 number of samples in the subframe. The maximum normal- 
ized correlation R(k) for each of a plurality of regions is 
determined. The regions may be four regions that represent 
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four sub- ranges within the range of possible lags. For 
example, a first region from 17-33 lags, a second region 
from 34-67 lags, a third region from 68-137 lags, and a 
fourth region from 138-148 lags. One open loop pitch lag 
corresponding to the lag that maximizes the normalized 
correlation values R(k) from each region are the initial pitch 
lag candidates. A best candidate from the initial pitch lag 
candidates is selected based on the normalized correlation, 
characterization information, and the history of the open 
loop pitch lag. This procedure may be performed for the 
second pitch lag and for the third pitch lag. 

Finally, the first, second, and third open loop pitch lags 
may be adjusted for an optimal fitting to the overall pitch 
contour and form the open loop pitch lag for the frame. The 
open loop pitch lag is provided to the pitch pre-processing 
module 322 for further processing that will be described 
later. The open loop pitch estimation module 316 also 
provides the pitch lag and normalized correlation values at 
the pitch lag. The normalized correlation values at the pitch 
lag are called a pitch correlation and are notated as R p . The 
pitch correlation R^ is used in characterizing the frame 
within the characterization section 318. 

3.8 Characterization Section 

The characterization section 318 is operable to analyze 
and characterize each frame of the pre-processed speech 
signal 308. The characterization information is utilized by a 
plurality of modules within the initial frame -processing 
module 44 as well by the excitation -processing module 54. 
Specifically, the characterization information is used in the 
rate selection module 320 and the type classification module 
324. In addition, the characterization information may be 
used during quantization and coding, particularly in empha- 
sizing the perceptually important features of the speech 
using a class-dependent weighting approach that will be 
described later. 

Characterization of the pre-processed speech signal 308 
by the characterization section 318 occurs for each frame. 
Operation of one embodiment of the characterization section 
318 may be generally described as six categories of analysis 
of the pre-processed speech signal 308. The six categories 
are: voice activity determination, the identification of 
unvoiced noise-like speech, a 6-class signal 
characterization, derivation of a noise-lo-signal ratio, a 
4-grade characterization, and a characterization of a station- 
ary long term spectral characteristic. 

3.9 Voice Activity Detection (VAD) Module 

The voice activity detection (VAD) module 326 performs 
voice activity determination as the first step in characteriza- 
tion. The VAD module 326 operates to determine if the 
pre-processed speech signal 308 is some form of speech or 
if it is merely silence or background noise. One embodiment 
of the VAD module 326 detects voice activity by tracking 
the behavior of the background noise. The VAD module 326 
monitors the difference between parameters of the current 
frame and parameters representing the background noise. 
Using a set of predetermined threshold values, the frame 
may be classified as a speech frame or as a background noise 
frame. 

The VAD module 326 operates to determine the voice 
activity based on monitoring a plurality of parameters, such 
as, the maximum of the absolute value of the samples in the 
frame, as well as the reflection coefficients, the prediction 
error, the LSFs and the l(Y h order autocorrelation coeffi- 
cients provided by the LPC analysis module 330. In 
addition, an example embodiment of the VAD module 326 
uses the parameters of the pitch lag and the adaptive 
codebook gain from recent frames. The pitch lags and the 



10 



adaptive codebook gains used by the VAD module 326 are 
from the previous frames since pitch lags and adaptive 
codebook gains of the current frame are not yet available. 
The voice activity determination performed by the VAD 
module 326 may be used to control several aspects of the 
encoding system 12, as well as forming part of a final class 
characterization decision by the characterization module 
328. 

3.10 Characterization Module 

Following the voice activity determination by the VAD 
module 326, the characterization module 328 is activated. 
The characterization module 328 performs the second, third, 
fourth and fifth categories of analysis of the pre-processed 
speech signal 308 as previously discussed. The second 
category is the detection of unvoiced noise-like speech 
frames. 

3.10.1 Unvoiced Noise-Like Speech Detection 
In general, unvoiced noise-like speech frames do not 
include a harmonic structure, whereas voiced frames do. The 
detection of an unvoiced noise-like speech frame, in one 
embodiment, is based on the pre-processed speech signal 
308, and a weighted residual signal R w (z) given by: 
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^(rflj)'^) 



(Equation 8) 



Where A(z/y 1 ) represents a weighted zeros-filter with the 
weighting Yj and S(z) is the pre-processed speech signal 
308. A plurality of parameters, such as the following six 
parameters may be used to determine if the current frame is 
unvoiced noise-like speech: 

1. The energy of the pre-processed speech signal 308 over 
the first 2 A of the frame. 

2. A count of the speech samples within the frame that are 
under a predetermined threshold, 

3. A residual sharpness determined using a weighted 
residual signal and the frame size. The sharpness is 
given by the ratio of the average of the absolute values 
of the samples to the maximum of the absolute values 
of the samples. The weighted residual signal may be 
determined from Equation 8. 

4. A first reflection coefficient representing the tilt of the 
magnitude spectrum of the pre-process speech signal 
308. 

5. The zero crossing rate of the pre-processed speech 
signal 308. 

6. A prediction measurement between the pre-processed 
speech signal 308 and the weighted residual signal. 

In one embodiment, a set of predetermined threshold 
values are compared to the above listed parameters in 
making the determination of whether a frame is unvoiced 
noise-like speech. The resulting determination may be used 
in controlling the pitch pre-processing module 322, and in 
the fixed codebook search, both of which will be described 
later. In addition, the unvoiced noise -like speech determi- 
nation is used in determining the 6-class signal character- 
ization of the pre-processed speech signal 308. 

3.10.2 6-Class Signal Characterization 

The characterization module 328 may also perform the 
third category of analysis that is the 6-class signal charac- 
terization. The 6-class signal characterization is performed 
by characterizing the frame into one of 6 classes according 
to the dominant features of the frame. In one embodiment, 
the 6 classes may be described as: 

0. Silence/Background Noise 

1. Stationary Noise-Like Unvoiced Speech 

2. Non-Stationary Unvoiced 



11/18/2003, EAST Version: 1.4.1 



US 6,5' 

37 

3. Onset 

4. Non-Stationary Voiced 

5. Stationary Voiced 

In an alternative embodiment, other classes are also 
included such as frames characterized as plosive. Initially, 
the characterization module 328 distinguishes between 
silence/background noise frames (class 0), non-stationary 
unvoiced frames (class 2), onset frames (class 3), and voiced 
frames represented by class 4 and 5. Characterization of 
voiced frames as Non-Stationary (class 4) and Stationary 
(class 5) may be performed during activation of the pitch 
pre-processing module 322. Furthermore, the characteriza- 
tion module 328 may not initially distinguish between 
stationary noise-like unvoiced frames (class 1) and non- 
stationary unvoiced frames (class 2). This characterization 
class may also be identified during processing by the pitch 
pre-processing module 322 using the determination by the 
unvoiced noise- like speech algorithm previously discussed. 

The characterization module 328 performs characteriza- 
tion using, for example, the pre-processed speech signal 308 
and the voice activity detection by the VAD module 326. In 
addition, the characterization module 328 may utilize the 
open loop pitch lag for the frame and the normalized 
correlation R p corresponding to the second open loop pitch 
lag. 

A plurality of spectral tilts and a plurality of absolute 
maximums may be derived from the pre-processed speech 
signal 308 by the characterization module 328. In an 
example embodiment, the spectral tilts for 4 overlapped 
segments comprising 80 samples each are calculated. The 4 
overlapped segments may be weighted by a Hamming 
window of 80 samples. The absolute maximums of an 
example embodiment are derived from 8 overlapped seg- 
ments of the pre-processed speech signal 308. In general, the 
length of each of the 8 overlapped segments is about 1.5 
times the period of the open loop pitch lag. The absolute 
maximums may be used to create a smoothed contour of the 
amplitude envelope. 

The spectral tilt, the absolute maximum, and the pitch 
correlation R p parameters may be updated or interpolated 
multiple times per frame. Average values for these param- 
eters may also be calculated several times for frames char- 
acterized as background noise by the VAD module 326. In 
an example embodiment, 8 updated estimates of each 
parameter arc obtained using 8 segments of 20 samples each. 
The estimates of the parameters for the background noise 
may be subtracted from the estimates of parameters for 
subsequent frames not characterized as background noise to 
create a set of "noise cleaned" parameters. 

A set of statistically based decision parameters may be 
calculated from the "noise clean" parameters and the open 
loop pitch lag. Each of the statistically based decision 
parameters represents a statistical property of the original 
parameters, such as, averaging, deviation, evolution, 
maximum, or minimums. Using a set of predetermined 
threshold parameters, initial characterization decisions may 
be made for the current frame based on the statistical 
decision parameters. Based on the initial characterization 
decision, past characterization decisions, and the voice 
activity decision of the VAD module 326, an initial class 
decision may be made for the frame. The initial class 
decision characterizes the frame as one of the classes 0, 2, 
3, or as a voiced frame represented by classes 4 and 5. 

3.10.3 Noise-to-Signal Ratio Derivation 

In addition to the frame characterization, the character- 
ization module 328 of one embodiment also performs the 
fourth category of analysis by deriving a noise -to-signal 
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ratio (NSR). The NSR is a traditional distortion criterion that 
may be calculated as the ratio between an estimate of the 
background noise energy and the frame energy of a frame. 
One embodiment of the NSR calculation ensures that only 

s true background noise is included in the ratio by using a 
modified voice activity decision. The modified voice activity 
decision is derived using the initial voice activity decision 
by the VAD module 326, the energy of the frame of the 
pre-processed speech signal 308 and the LSFs calculated for 

10 the lookahead portion. If the modified voice activity deci- 
sion indicates that the frame is background noise, the energy 
of the background noise is updated. 

The background noise is updated from the frame energy 
using, for example, moving average. If the energy level of 

is the background noise is larger than the energy level of the 
frame energy, it is replaced by the frame energy. Replace- 
ment by the frame energy can involve shifting the energy 
level of the background noise lower and truncating the 
result. The result represents the estimate of the background 

20 noise energy that may be used in the calculation of the NSR. 
Following calculation of the NSR, the characterization 
module 328 performs correction of the initial class decision 
to a modified class decision. The correction may be per- 
formed using the initial class decision, the voice activity 

25 determination and the unvoiced noise-like speech determi- 
nation. In addition, previously calculated parameters 
representing, for example, the spectrum expressed by the 
reflection coefficients, the pitch correlation R^,, the NSR, the 
energy of the frame, the energy of the previous frames, the 

30 residual sharpness and a sharpness of the weighted speech 
may also be used. The correction of the initial class decision 
is called characterization tuning. Characterization tuning can 
change the initial class decision, as well as set an onset 
condition flag and a noisy voiced flag if these conditions are 

35 identified. In addition, tuning can also trigger a change in the 
voice activity decision by the VAD module 326. 

3.10.4 4-Grade Characterization 

The characterization module 328 can also generate the 
fifth category of characterization, namely, the 4-grade char- 

40 acterization. The 4-grade characterization is a parameter that 
controls the pitch pre-processing module 322. One embodi- 
ment of the 4-grade characterization distinguishes between 
4 categories. The categories may be labeled numerically 
from 1 to 4. The category labeled 1 is used to reset the pitch 

45 pre-processing module 322 in order to prevent accumulated 
delay that exceeds a delay budget during pitch pre- 
processing, In general, the remaining categories indicate 
increasing voicing strength. Increasing voicing strength is a 
measure of the periodicity of the speech. In an alternative 

50 embodiment, more or less categories could be included to 
indicate the levels of voicing strength. 

3.10.5 Stationary Long-Term Spectral Characteristics 
The characterization module 328 may also performs the 

sixth category of analysis by determining the stationary 
55 long-term spectral characteristics of the pre-processed 
speech signal 308. The stationary long-term spectral char- 
acteristic is determined over a plurality of frames using, for 
example, spectral information such as the LSFs, the 6-class 
signal characterization and the open loop pitch gain. The 
60 determination is based on long-term averages of these 
parameters. 

3.11 Rate Selection Module 

Following the modified class decision by the character- 
ization module 328, the rate selection module 320 can make 
65 an initial rate selection called an open loop rate selection. 
The rate-selection module 320 can use, for example, the 
modified class decision, the NSR, the onset flag, the residual 
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energy, the sharpness, the pitch correlation and spectral 
parameters such as the reflection coefficients in determining 
the open-loop rate selection. The open loop rate selection 
may also be selected based on the Mode that the speech 
compression system 10 is operating within. The rate selec- s 
tion module 320 is tuned to provide the desired average bit 
rate as indicated by each of the Modes. The initial rate 
selection may be modified following processing by the pitch 
pre-processing module 322 that will be described later. 
3.12 Pitch Pre-Processing Module 10 

The pitch pre-processing module 322 operates on a frame 
basis to perform analysis and modification of the weighted 
speech 344. The pitch pre-processing module 322 may, for 
example, uses compression or dilation techniques on pitch 
cycles of the weighted speech 344 in order to improve the is 
encoding process. The open loop pitch lag is quantized by 
the pitch pre-processing module 322 to generate the open 
loop adaptive codebook component 144a or 176a, as pre- 
viously discussed with reference to FIGS. 2, 4 and 5. If the 
final type classification of the frame is Type One, this 20 
quantization represents the pitch lag for the frame. However, 
if the type classification is changed following processing by 
the pitch pre-processing module 322, the pitch lag quanti- 
zation also is changed to represent the closed loop adaptive 
codebook component 1446 or 176b, as previously discussed 25 
with reference to FIGS. 2, 4 and 5. 

The open loop pitch lag for the frame that was generated 
by the open loop pitch estimation module 316 is quantized 
and interpolated, to create a pitch track 348. In general, the 
pitch pre-processing module 322 attempts to modify the 30 
weighted speech 344 to fit the pitch track 348. If the 
modification is successful, the final type classification of the 
frame is Type One. If the modification is unsuccessful the 
final type classification of the frame is Type Zero. 

As further detailed later, the pitch pre-processing modi- 35 
fication procedure can perform continuous time warping of 
the weighted speech 344. The warping introduces a variable 
delay. In one example embodiment, the maximum variable 
delay within the encoding system 12 is 20 samples (2.5 ms). 
The weighted speech 344 may be modified on a pitch 40 
cycle-by-pitch cycle basis, with certain overlap between 
adjacent pitch cycles to avoid discontinuities between the 
reconstructed/modified segments. The weighted speech 344 
may be modified according to the pitch track 348 to generate 
a modified weighted speech 350. In addition, a plurality of 45 
unquantized pilch gains 352 are generated by the pitch 
pre-processing module 322. If the type classification of the 
frame is Type One, the unquantized pitch gains 352 are used 
to generate the Type One adaptive codebook gain compo- 
nent I486 (for full rate codec 22) or 1806 (for half-rate 50 
codec 24). The pitch track 348, the modified weighted 
speech 350 and the unquantized pitch gains 352 are provided 
to the excitation-processing module 54. 

As previously discussed, the 4-grade characterization by 
the characterization module 328 controls the pitch pre- 55 
processing. In one embodiment, if the frame is predomi- 
nantly background noise or unvoiced with low pitch 
correlation, such as, category 1, the frame remains 
unchanged and the accumulated delay of the pitch pre- 
processing is reset to zero. If the frame is pre-dominantly 60 
pulse-like unvoiced, such as, category 2, the accumulated 
delay may be maintained without any warping of the signal 
except for a simple time shift. The time shift may be 
determined according to the accumulated delay of the input 
speech signal 18. For frames with the remaining 4-grade 65 
characterizations, the core of the pitch pre-processing algo- 
rithm may be executed in order to optimally warp the signal. 
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In general, the core of the pitch pre-processing module 
322 in one embodiment performs three main tasks. First, the 
weighted speech 344 is modified in an attempt to match the 
pitch track 348. Second, a pitch gain and a pitch correlation 
for the signal are estimated. Finally, the characterization of 
the speech signal 18 and the rate selection is refined based 
on the additional signal information obtained during the 
pitch pre-processing analysis. In another embodiment, addi- 
tional pitch pre-processing may be included, such as, wave- 
form interpolation. In general, waveform interpolation may 
be used to modify certain irregular transition segments using 
forward-backward waveform interpolation techniques to 
enhance the regularities and suppress the irregularities of the 
weighted speech 344. 
3.12.1 Modification 

Modification of the weighted speech 344 provides a more 
accurate fit of the weighted speech 344 into a pitch-coding 
model that is similar to the Relaxed Code Excited Linear 
Prediction (RCELP) speech coding approach. An example 
of an implementation of RCELP speech coding is provided 
in the TIA (Telecommunications Industry Association) 
IS -127 standard. Performance of the modification without 
any loss of perceptual quality can include a fine pitch search, 
estimation of a segment size, target signal warping, and 
signal warping. The fine pitch search may be performed on 
a frame level basis while the estimation of a segment size, 
the target signal warping, and the signal warping may be 
executed for each pitch cycle. 

3.12.1.1 Fine Pitch Search 

The fine pitch search may be performed on the weighted 
speech 344, based on the previously determined second and 
third pitch lags, the rate selection, and the accumulated pitch 
pre-processing delay. The fine pitch search searches for 
fractional pitch lags. The fractional pitch lags are non- 
integer pitch lags that combine with the quantization of the 
lags. The combination is derived by searching the quanti- 
zation tables of the lags used to quantize the open loop pitch 
lags and finding lags that maximize the pitch correlation of 
the weighted speech 344. In one embodiment, the search is 
performed differently for each codec due to the different 
quantization techniques associated with the different rate 
selections. The search is performed in a search area that is 
identified by the open loop pitch lag and is controlled by the 
accumulated delay. 

3.12.1.2 Estimate Segment Size 

The segment size follows the pitch period, with some 
minor adjustments. In general, the pitch complex (the main 
pulses) of the pitch cycle are located towards the end of a 
segment in order to allow for maximum accuracy of the 
warping on the perceptual most important part, the pitch 
complex. For a given segment the starting point is fixed and 
the end point may be moved to obtain the best model fit. 
Movement of the end point effectively stretches or com- 
presses the time scale. Consequently, the samples at the 
beginning of the segment are hardly shifted, and the greatest 
shift will occur towards the end of the segment. 

3.12.1.3 Target Signal for Warping 
One embodiment of the target signal for time warping is 

a synthesis of the current segment derived from the modified 
weighted speech 350 that is represented by s'j(n) and the 
pitch track 348 represented by L p (n). According to the pitch 
track 348, Lp(n), each sample value of the target signal 
sj(a) f n-0, . . . , Nj-1 may be obtained by interpolation of 
the modified weighted speech 350 using a 21" order Ham- 
ming weighted Sine window, 
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The pitch gain is provided to the excitation-processing 
io (Equation 9) module 54 as the unquantized pitch gains 352. The pitch 

= Yj ^(/(M"))* '>■*>»- '(M rt )))> correlation may be given by 

(3-10 

for it = 0 N,-l 5 (Equation 14) 



where, iO-^n)) and f(L^(n)) are the integer and fractional / f^S * jCC") 2 ] • f^Z* ^C«) 2 ) 

parts of the pitch lag, respectively; w,(f ,i) is the Hamming V V «-o M ) 

weighted Sine window, and N, is the length of the segment. to 

A weighted target, ^(n), is given by ^(nWn)^). Botn parameters are available on apitch cycle basis and may 
The weighting function, w # (n), may be a two-piece linear be linearly interpolated. 

function, which emphasizes the pitch complex and 3.123 Refined Classification and Refined Rate Selection 
de-emphasizes the "noise" in between pitch complexes. The Following pitch pre-processing by the pitch pre- 
weighting may be adapted according to the 4-grade 15 processing module 322, the average pitch correlation and the 
classification, by increasing the emphasis on the pitch com- P^h gains are provided to the characterization module 328 
plex for segments of higher periodicity. and , ^ "J 6 f e u c tior J mo f u } & 320. The characterization 

r & *& r / module 328 and the rate selection module 320 create a final 

The integer shift that maximizes the normalized cross characterization class and a final rate selection, respectively, 
correlation between the weighted target sJ'Xn) and the 20 using the pitch correlation and the pitch gains. The final 
weighted speech 344 is sjn+x ac ^), where sjji+x acc ) is the characterization class and the final rate selection may be 
weighted speech 344 shifted according to an accumulated determined by refining the 6-class signal characterization 
delay x acc may be found by maximizing and the open loop rate selection of the frame. 

Specifically, the characterization module 328 determines 
^ (Equation 10) 25 whether a frame with a characterization as a voiced frame 

2^ $w( n ) ' s w(n + t<hx + T*hift) should be characterized as class 4 — "Non-Stationary 

R{TMfi) ~ " = ° — . Voiced", or class 5 — "Stationary Voiced/' In addition, a final 

i[ N f! ^inA-{ H f! s~(n + r mc + t^j*) 2 ) determination that a particular frame is stationary noise -like 

\U=o " J \ n=o mc ) unvoiced speech may occur based on the previous determi- 

30 nation that the particular frame is modified unvoiced noise- 
like speech. Frames confirmed to be noise-like unvoiced 
A refined (fractional) shift may be determined by searching speech may be characterized as class 1, "Stationary Noise- 
an upsampled version of R(x shifi ) in the vicinity of x shift . This jj^e Unvoiced Speech." 

may result in a final optimal shift x opt and the corresponding Based on tne fina i characterization class, the open loop 
normalized cross correlation R n (x 0/jr ). 35 rate selection by the rate selection module 320 and the half 

3.12.1.4 Signal Warping rate signaling flag on the half rate signal line 30 (FIG. 1), a 

The modified weighted speech 350 for the segment may final rate selection may be determined. The final rate selec- 
be reconstructed according to the mapping given by tion is provided to the excitation-processing module 54 as a 

rate selection indicator 354. In addition, the final character- 
[^(m-T^)^>« flrc +T c +T^)]^[5U»)rr'K.(«+^- 3)1, (Equation 11) 40 ization class for the frame is provided to the excitation- 
processing module 54 as control information 356. 
an d 3.13 Type Classification Module 

For the full rate codec 22 and the half rate codec 24, the 
[*>« W « B «^>«^ final characterization class may also be used by the type 

1)] (Equation 12) 45 classification module 324. A frame with a final character- 

ization class of class 0 to 4 is determined to be a Type Zero 
where x c is a parameter defining the warping function. In frame, and a frame of class 5 is determined to be a Type One 
general, x c specifies the beginning of the pitch complex. The frame. The type classification is provided to the excitation- 
mapping given by Equation 11 specifies a time warping, and processing module 54 as a type indicator 358. 
the mapping given by Equation 12 specifies a time shift (no 50 4.0 Excitation Processing Module 

warping). Both may be carried out using a Hamming The type indicator 358 from the type classification mod- 
weighted Sine window function. ule 324 selectively activates either the full-rate module 54 or 
3.12.2 Pitch Gain and Pitch Correlation Estimation the half-rate module 56, as illustrated in FIG. 2, depending 
The pitch gain and pitch correlation may be estimated on on the rate selection. FIG. 10 is a block diagram representing 
a pitch cycle basis and are defined by Equations 11 and 12, 55 lhe F0 or H0 ^ subframe-processing module 70 or 80 
respectively. The pitch gain is estimated in order to mini- illustrated in FIG 2 that is activated for the Type Zero 
mize the mean squared error between the target s„'(n), classification Sim, arly, FIG. 11 is a block diagram repre- 
defined by Equation 9, and the final modified signal s„'(n), SC "^S lhc F1 or HI first frame processing modu e 72 or 82 
defined by Equations 11 and 12, and may be given by the F } °' H1 ^ cond s " bf ! ame Processing module 74 or 84 

60 and the Fl or HI second frame processing module 76 or 06 
J that are activated for Type One classification. As previously 

1^ (Equation 13) discussed, the "F" and "H" represent the full-rate codec 22 

^ and the half-rate codec 24, respectively, 

tfj-i 2 Activation of the quarter-rate module 60 and the eighth - 

j? 0 ^ (n) 65 rate module 62 illustrated in FIG. 2 may be based on the rate 

selection. In one embodiment, a pseudo-random sequence is 
generated and scaled to represent the short-term excitation. 
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The energy component 204 and 242 (FIG. 2) represents the 
scaling of the pseudo-random sequence, as previously dis- 
cussed. In one embodiment, the "seed" used for generating 
the pseudo-random sequence is extracted from the bitstream, 
thereby providing synchronicity between the encoding sys- 5 
tem 12 and the decoding system 16. 

As previously discussed, the excitation processing mod- 
ule 54 also receives the modified weighted speech 350, the 
unquantized pitch gains 352, the rate indicator 354 and the 
control information 356. The quarter and eighth rate codecs 10 
26 and 28 do not utilize these signals during processing. 
However, these parameters may be used to further process 
frames of the speech signal 18 within the full-rate codec 22 
and the half-rate codec 24. Use of these parameters by the 
full-rate codec 22 and the half-rate codec 24, as described 15 
later, depends on the type classification of the frame as Type 
Zero or Type One. 

4.1 Excitation Processing Module for Type Zero Frames of 
the Full-Rate Codec and the Half-Rate Codec 

Referring now to FIG. 10, one embodiment of the F0 or 20 
HO first subframe -processing module 70, 80 comprises an 
adaptive codebook section 362, a fixed codebook section 
364 and a gain quantization section 366. The processing and 
coding for frames of Type Zero is somewhat similar to the 
traditional CELP encoding, for example, of TIA 25 
(Telecommunications Industry Association) standard 
IS-127. For the full -rate codec 22, the frame may be divided 
into four subframes, while for the half -rate codec 24, the 
frame may be divided into two subframes, as previously 
discussed. The functions represented in FIG. 10 are executed 
on a subframe basis. 

The F0 or HO first subframe-processing module 70 and 80 
(FIG, 2) operate to determine the closed loop pitch lag and 
the corresponding adaptive codebook gain for the adaptive 
codebook. In addition, the long-term residual is quantized 
using the fixed codebook, and the corresponding fixed 
codebook gain is also determined. Quantization of the 
closed loop pitch lag and joint quantization of the adaptive 
codebook gain and the fixed codebook gain are also per- 
formed. 

4.1.1 Adaptive Codebook Section 

The adaptive codebook section 362 includes an adaptive 
codebook 368, a first multiplier 370, a first synthesis filter 
372, a first perceptual weighting filter 374, a first subtractor 
376 and a first minimization module 378. The adaptive 
codebook section 362 performs a search for the best closed 
loop pitch lag from the adaptive codebook 368 using the 
analysis-by-synthesis (ABS) approach. 

A segment from the adaptive codebook 368 correspond- 
ing to the closed loop pitch lag may be referred to as an 
adaptive codebook vector (v a ) 382. The pitch track 348 from 
the pitch pre-processing module 322 of FIG. 9 may be used 
to identify an area in the adaptive codebook 368 to search for 
vectors for the adaptive codebook vector (\ a ) 382. The first 
multiplier 370 multiplies the selected adaptive codebook 
vector (v fl ) 382 by a gain (gj 384. The gain (gj 384 is 
unquantized and represents an initial adaptive codebook 
gain that is calculated as will be described later. The result- 
ing signal is passed to the first synthesis filter 372 that 
performs a function that is the inverse of the LPC analysis 60 
previously discussed. The first synthesis filter 372 receives 
the quantized LPC coefficients A^z) 342 from the LSF 
quantization module 334 and together with the first percep- 
tual weighting filter module 374, creates a first resynthesized 
speech signal 386. The first subtractor 376 subtracts the first 65 
resynthesized speech signal 386 from the modified weighted 
speech 350 to generate a long-term error signal 388. The 
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modified weighted speech 350 is the target signal for the 
search in the adaptive codebook 368. 

The first minimization module 378 receives the long-term 
error signal 388 that is a vector representing the error in 
quantizing the closed loop pitch lag. The first minimization 
module 378 performs calculation of the energy of the vector 
and determination of the corresponding weighted mean 
squared error. In addition, the first minimization module 378 
controls the search and selection of vectors from the adap- 
tive codebook 368 for the adaptive codebook vector (v a ) 382 
in order to reduce the energy of the long-term error signal 
388. 

The search process repeats until the first minimization 
module 378 has selected the best vector for the adaptive 
codebook vector (v a ) 382 from the adaptive codebook 368 
for each subframe. The index location of the best vector for 
the adaptive codebook vector (v a ) 382 within the adaptive 
codebook 368 forms part of the closed loop adaptive code- 
book component 1446, 1766 (FIG, 2). This search process 
effectively minimizes the energy of the long-term error 
signal 388. The best closed loop pitch lag is selected by 
selecting the best adaptive codebook vector (v fl ) 382 from 
the adaptive codebook 368. The resulting long-term error 
signal 388 is the modified weighted speech signal 350 less 
the filtered best vector for the adaptive codebook vector (v fl ) 
382. 

4.1.1.1 Closed-Loop Adaptive Codebook Search for the 
Full-Rate Codec 

The closed loop pitch lag for the full-rate codec 22 is 
represented in the bitstream by the closed loop adaptive 
codebook component 1446. For one embodiment of the 
full-rate codec 22, the closed loop pitch lags for the first and 
the third subframes are represented with 8 bits, and the 
closed loop pitch lags for the second and the fourth sub- 
frames are represented with 5 bits, as previously discussed. 
In one embodiment, the lag is in a range of 17 to 148 lags. 
The 8 bits and the 5 bits may represent the same pitch 
resolution. However, the 8 bits may also represent the full 
range of the closed loop pitch lag for a subframe and the 5 
bits may represent a limited value of closed loop pitch lags 
around the previous subframe closed loop pitch lag. In an 
example embodiment, the closed loop pitch lag resolution is 
0.2, uniformly, between lag 17 and lag 33. From lag 33 to lag 
91 of the example embodiment, the resolution is gradually 
increased from 0.2 to 0.5, and the resolution from lag 91 to 
lag 148 is 1.0, uniformly. 

The adaptive codebook section 362 performs an integer 
lag search for closed loop integer pitch lags. For the first and 
the third subframes (i.e. those represented with 8 bits), the 
integer lag search may be performed on the range of [L p -3, 
. . . ,L^+3]. Where L p is the subframe pitch lag. The 
subframe pilch lag is obtained from the pitch track 348, 
which is used to identify a vector in the adaptive codebook 
368. The cross-correlation function, R(l), for the integer lag 
search range may be calculated according to 



(Equation 15) 
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where t(n) is the target signal that is the modified weighted 
speech 350, e(n) is the adaptive codebook contribution 
represented by the adaptive codebook vector (v fl ) 382, h(n) 
is the combined response of the first synthesis filter 372 and 
the perceptual weighting filter 374. In the example 
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embodiment, there are 40 samples in a subframe, although 
more or less samples could be used. 

The closed loop integer pitch lag that maximizes R(l) may 
be chosen as a refined integer lag. The best vector from the 
adaptive codebook 368 for the adaptive codebook vector 
(v fl ) 382 may be determined by upsampling the cross- 
correlation function R(l) using a 9 th order Hamming 
weighted Sine. Upsampling is followed by a search of the 
vectors within the adaptive codebook 368 that correspond to 
closed loop pitch lags that are within 1 sample of the refined 
integer lag. The index location within the adaptive codebook 
368 of the best vector for the adaptive codebook vector (v a ) 
382 for each subframe is represented by the closed loop 
adaptive codebook component 144b in the bitstream. 

The initial adaptive codebook gain may be estimated 
according to: 
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(Equation 16) 



where L^** represents the lag of the best vector for the 
adaptive codebook vector (v a ) 382 and t(n-L p cpT ) represents 
the best vector for the adaptive codebook vector (vj 382. In 
addition, in this example embodiment, the estimate is 
bounded by 0.0^g^l.2, and n represents 40 samples in a 
subframe. A normalized adaptive codebook correlation is 
given by R(l) when l=L p opt . The initial adaptive codebook 
gain may be further normalized according to the normalized 
adaptive codebook correlation, the initial class decision and 
the sharpness of the adaptive codebook contribution. The 
normalization results in the gain (g a ) 384. The gain (g„) 384 
is unquantized and represents the initial adaptive codebook 
gain for the closed loop pitch lag. 

4.1.1.2 Closed-Loop Adaptive Codebook Search for Half- 
Rate Coding 

The closed loop pitch lag for the half-rate codec 24 is 
represented by the closed loop adaptive codebook compo- 
nent 176k (FIG. 2). For the half-rate codec 24 of one 
embodiment, the closed loop pitch lags for each of the two 
subframes are encoded in 7 bits each with each representing 
a lag in the range of 17 to 127 lags. The integer lag search 
may be performed on the range of [L^-3, . . , ,L_+3] as 
opposed to the fractional search performed in the full -rate 
codec 22. The cross-correlation function R(l) may be cal- 
culated as in Equation 15, where the summation is per- 
formed on an example embodiment subframe size of 80 
samples. The closed loop pitch lag that maximizes R(l) is 
chosen as the refined integer lag. The index location within 
the adaptive codebook 368 of the best vector for the adaptive 
codebook vector (v a ) 382 for each subframe is represented 
by the closed loop adaptive codebook component 116b in 
the bitstream. 

The initial value for the adaptive codebook gain may be 
calculated according to Equation 16, where the summation 
is performed on an example embodiment subframe size of 
80 samples. The normalization procedures as previously 
discussed may then be applied resulting in the gain (g a ) 384 
that is unquantized. 

The long-term error signal 388 generated by either the 
full-rate codec 22 or the half-rate codec 24 is used during the 
search by the fixed codebook section 364. Prior to the fixed 
codebook search, the voice activity decision from the VAD 
module 326 of FIG. 9 that is applicable to the frame is 
obtained. The voice activity decision for the frame may be 
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sub-divided into a subframe voice activity decision for each 
subframe. The subframe voice activity decision may be used 
to improve perceptual selection of the fixed-codebook con- 
tribution. 

4.1.2 Fixed Codebook Section 

The fixed codebook section 364 includes a fixed codc- 
book 390, a second multiplier 392, a second synthesis filter 
394, a second perceptual weighting filter 396, a second 
subtractor 398, and a second minimization module 400. The 
search for the fixed codebook contribution by the fixed 
codebook section 364 is similar to the search within the 
adaptive codebook section 362. 

A fixed codebook vector (v c ) 402 representing the long- 
term residual for a subframe is provided from the fixed 
codebook 390. The second multiplier 392 multiplies the 
fixed codebook vector (v c ) 402 by a gain (gj 404. The gain 
(g.) 404 is unquantized and is a representation of the initial 
value of the fixed codebook gain that may be calculated as 
later described. The resulting signal is provided to the 
20 second synthesis filter 394. The second synthesis filter 394 
receives the quantized LPC coefficients A^z) 342 from the 
LSF quantization module 334 and together with the second 
perceptual weighting filter 396, creates a second resynthe- 
sized speech signal 406. The second subtractor 398 subtracts 
the resynthesized speech signal 406 from the long-term error 
signal 388 to generate a vector that is a fixed codebook error 
signal 408. 

The second minimization module 400 receives the fixed 
codebook error signal 408 that represents the error in 
quantizing the long-term residual by the fixed codebook 
390. The second minimization module 400 uses the energy 
of the fixed codebook error signal 408 to control the selec- 
tion of vectors for the fixed codebook vector (v c ) 402 from 
the fixed codebook 292 in order to reduce the energy of the 
fixed codebook error signal 408, The second minimization 
module 400 also receives the control information 356 from 
the characterization module 328 of FIG. 9. 

The final characterization class contained in the control 
information 356 controls how the second minimization 
module 400 selects vectors for the fixed codebook vector 
(vj 402 from the fixed codebook 390. The process repeats 
until the search by the second minimization module 400 has 
selected the best vector for the fixed codebook vector (v c ) 
402 from the fixed codebook 390 for each subframe. The 
best vector for the fixed codebook vector (v c ) 402 minimizes 
the error in the second resynthesized speech signal 406 with 
respect to the long-term error signal 388. The indices 
identify the best vector for the fixed codebook vector (v c ) 
402 and, as previously discussed, may be used to form the 
fixed codebook component 146a and 178a. 
4.1.2.1 Fixed Codebook Search for the Full-Rate Codec 
As previously discussed with reference to FIGS. 2 and 4, 
the fixed codebook component 146a for frames of Type Zero 
classification may represent each of four subframes of the 
full-rate codec 22 using the three 5-pulse codebooks 160. 
When the search is initiated, vectors for the fixed codebook 
vector (v c ) 402 within the fixed codebook 390 may be 
determined using the long-term error signal 388 that is 
represented by: 
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(Equation 17) 



Pitch enhancement may be applied to the three 5-pulse 
codebooks 160 (illustrated in FIG. 4) within the fixed 
codebook 390 in the forward direction during the search. 
The search is an iterative, controlled complexity search for 
the best vector for the fixed codebook vector (v c ) 402. An 
initial value for fixed codebook gain represented by the gain 
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(gc) 404 may be found simultaneously with the search for the 
best vector for the fixed codebook vector (v c ) 402. 

In an example embodiment, the search for the best vector 
for the fixed codebook vector (v c ) 402 is completed in each 
of the three 5-pulse codebooks 160. At the conclusion of the 
search process within each of the three 5-pulse codebooks 
160, candidate best vectors for the fixed codebook vector 
(v c ) 402 have been identified. Selection of one of the three 
5-pulse codebooks 160 and which of the corresponding 
candidate best vectors will be used may be determined using 
the corresponding fixed codebook error signal 408 for each 
of the candidate best vectors. Determination of the weighted 
mean squared error (WMSE) for each of the corresponding 
fixed codebook error signals 408 by the second minimiza- 
tion module 400 is first performed. For purposes of this 
discussion, the weighted mean squared errors (WMSEs) for 
each of the candidate best vectors from each of the three 
5-pulse codebooks 160 will be referred to as first, second 
and third fixed codebook WMSEs. 

The first, second, and third fixed codebook WMSEs may 
be first weighted. Within the full-rate codec 22, for frames 
classified as Type Zero, the first, second, and third fixed 
codebook WMSEs may be weighted by the subframe voice 
activity decision. In addition, the weighting may be provided 
by a sharpness measure of each of the first, second, and third 
fixed codebook WMSEs and the NSR from the character- 
ization module 328 of FIG. 9. Based on the weighting, one 
of the three 5-pulse fixed codebooks 160 and the best 
candidate vector in that codebook may be selected. 

The selected 5-pulse codebook 160 may then be fine 
searched for a final decision of the best vector for the fixed 
codebook vector (y c ) 402. The fine search is performed on 
the vectors in the selected one of the three 5-pulse codebook 
160 that are in the vicinity of the best candidate vector 
chosen. The indices that identify the best vector for the fixed 
codebook vector (v c ) 402 within the selected one of the three 
5-pulse codebook 160 are part of the fixed codebook com- 
ponent 178a in the bitstream. 

4.1.2.2 Fixed Codebook Search for the Half-Rate Codec 

For frames of Type Zero classification, the fixed codebook 
component 178a represents each of the two subframes of the 
half-rate codec 24. As previously discussed, with reference 
to FIG. 5, the representation may be based on the pulse 
codebooks 192, 194 and the Gaussian codebook 195. The 
initial target for the fixed codebook gain represented by the 
gain (g c ) 404 may be determined similarly to the full -rate 
codec 22. In addition, the search for the fixed codebook 
vector (v c ) 402 within the fixed codebook 390 may be 
weighted similarly to the full-rate codec 22. In the half-rate 
codec 24, the weighting may be applied to the best candidate 
vectors from each of the pulse codebooks 192 and 194 as 
well as the Gaussian codebook 195. The weighting is 
applied to determine the most suitable fixed codebook vector 
(y c ) 402 from a perceptual point of view. In addition, the 
weighting of the weighted mean squared error (WMSE) in 
the half -rate codec 24 may be further enhanced to emphasize 
the perceptual point of view. Further enhancement may be 
accomplished by including additional parameters in the 
weighting. The additional factors may be the closed loop 
pitch lag and the normalized adaptive codebook correlation. 

In addition to the enhanced weighting, prior to the search 
of the codebooks 192, 194, 195 for the best candidate 
vectors, some characteristics may be built into the entries of 
the pulse codebooks 192, 194. These characteristics can 
provide further enhancement to the perceptual quality. In 
one embodiment, enhanced perceptual quality during the 
searches may be achieved by modifying the filter response 
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of the second synthesis filter 394 using three enhancements. 
The first enhancement may be accomplished by injecting 
high frequency noise into the fixed codebook, which modi- 
fies the high-frequency band. The injection of high fre- 
quency noise may be incorporated into the response of the 
second synthesis filter 394 by convolving the high frequency 
noise impulse response with the impulse response of the 
second synthesis filter 394. 

The second enhancement may be used to incorporate 
additional pulses in locations that can be determined by high 
correlations in the previously quantized subframe. The 
amplitude of the additional pulses may be adjusted accord- 
ing to the correlation strength, thereby allowing the decod- 
ing system 16 to perform the same operation without the 
necessity of additional information from the encoding sys- 
tem 12. The contribution from these additional pulses also 
may be incorporated into the impulse response of the second 
synthesis filter 394. The third enhancement filters the fixed 
codebook 390 with a weak short-term spectral filter to 
compensate for the reduction in the formant sharpness 
resulting from bandwidth expansion and the quantization of 
the LSFs. 

The search for the best vector for the fixed codebook 
vector (v c ) 402 is based on minimizing the energy of the 
fixed codebook error signal 408, as previously discussed. 
The search may first be performed on the 2-pulse codebook 
192. The 3-pulse codebook 194 may be searched next, in 
two steps. The first step can determine a center for the 
second step that may be referred to as a focused search. 
Backward and forward weighted pitch enhancement may be 
applied for the search in both pulse codebooks 192 and 194. 
The Gaussian codebook 195 may be searched last, using a 
fast search routine that is used to determine the two orthogo- 
nal basis vectors for encoding as previously discussed. 

The selection of one of the codebooks 192, 194 and 195 
and the best vector for the fixed codebook vector (v c ) 402 
may be performed similarly to the full-rate codec 22. The 
indices that identify the best vector for the fixed codebook 
vector (v c ) 402 within the selected codebook are part of the 
fixed codebook component 178a in the bitstream. 

At this point, the best vectors for the adaptive codebook 
vector (v a ) 382 and the fixed codebook vector (v c ) 402 have 
been found within the adaptive and fixed codebooks 368, 
390, respectively. The unquantized initial values for the gain 
(g fl ) 384 and the gain (g c ) 404 now may be replaced by the 
best gain values. The best gain values may be determined 
based on the best vectors for the adaptive codebook vector 
(v a ) 382 and the fixed codebook vector (v tf ) 402 previously 
determined. Following determination of the best gains, they 
are jointly quantized. Determination and quantization of the 
gains occurs within the gain quantization section 366. 

4.1.3 Gain Quantization Section 

The gain quantization section 366 of one embodiment 
includes a 2D VQ gain codebook 412, a third multiplier 414, 
a fourth multiplier 416, an adder 418, a third synthesis filter 
420, a third perceptual weighting filter 422, a third subtrac- 
ter 424, a third minimization module 426, and an energy 
modification section 428. The energy modification section 
428 of one embodiment includes an energy analysis module 
430 and an energy adjustment module 432. Determination 
and quantization of the fixed and adaptive codebook gains 
may be performed within the gain quantization section 366. 
In addition, further modification of the modified weighted 
speech 350 occurs in the energy modification section 428, as 
will be discussed, to form a modified target signal 434 that 
may be used for the quantization. 

Determination and quantization involves searching to 
determine a quantized gain vector (g^) 433 that represents 
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the joint quantization of the adaptive codebook gain and the 
fixed codebook gain. The adaptive and fixed codebook 
gains, for the search, may be obtained by minimizing the 
weighted mean square error according to: 
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(Equation 18) 
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Where v fl (n) is the best vector for the adaptive codebook 
vector (v a ) 382, and v c (n) is the best vector for the fixed 
codebook vector (v c ) 402 as previously discussed. In the 
example embodiment, the summation is based on a frame 
that contains 80 samples, such as, in one embodiment of the 
half-rate codec 24. The minimization may be obtained 
jointly (obtaining g fl and g c concurrently) or sequentially 
(obtaining g^ first and then gj, depending on a threshold 
value of the normalized adaptive codebook correlation. The 
gains may then be modified in part, to smooth the fluctua- 
tions of the reconstructed speech in the presence of back- 
ground noise. The modified gains are denoted g' a and g' c . 
The modified target signal 434 may be generated using the 
modified gains by: 
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(Equation 19) 



A search for the best vector for the quantized gain vector 
(Lie) ^3 is performed within the 2D VQ gain codebook 
412. The 2D VQ gain codebook 412 may be the previously 
discussed 2D gain quantization table illustrated as Table 4. 
The 2D VQ gain codebook 412 is searched for vectors for 
the quantized gain vector (g ac ) 433 that minimize the mean 
square error, i.e., minimizing 
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(Equation 20) 



where a quantized fixed codebook gain (g a ) 435 and a 
quantized adaptive codebook gain (g c ) 436 may be derived 
from the 2D VQ gain codebook 412. In the example 
embodiment, the summation is based on a frame that con- 
tains 80 samples, such as, in one embodiment of the half-rate 
codec 24. The quantized vectors in the 2D VQ gain code- 
book 412 actually represent the adaptive codebook gain and 
a correction factor for the fixed codebook gain as previously 
discussed. 

Following determination of the modified target signal 
434, the quantized gain vector (g fl(1 ) 433 is passed to mul- 
tipliers 414, 416. The third multiplier 414 multiplies the best 
vector for the adaptive codebook vector (v a ) 382 from the 
adaptive codebook 368 with the quantized adaptive code- 
book gain (gj 435. The output from the third multiplier 414 
is provided to the adder 418. Similarly, the fourth multiplier 
416 multiplies the quantized fixed codebook gain (g c ) 436 
with the best vector for the fixed codebook vector (v J 402 
from the fixed codebook 390. The output from the fourth 
multiplier 416 is also provided to the adder 418. The adder 
418 adds the outputs from the multipliers 414, 416 and 
provides the resulting signal to the third synthesis filter 420. 

The combination of the third synthesis filter 420 and the 
perceptual weighting filter 422 generates a third resynthe- 
sized speech signal 438. As with the first and second 
synthesis filters 372 and 394, the third synthesis filter 420 
receives the quantized LPC coefficients A^z) 342, The third 
subtractor 424 subtracts the third resynthesized speech sig- 
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nal 438 from the modified target signal 434 to generate a 
third error signal 442. The third minimization module 426 
receives the third error signal 442 that represents the error 
resulting from joint quantization of the fixed codebook gain 
and the adaptive codebook gain by the 2D VQ gain code- 
book 412. The third minimization module 426 uses the 
energy of the third error signal 442 to control the search and 
selection of vectors from the 2D VQ gain codebook 412 in 
order to reduce the energy of the third error signal 442. 

The process repeats until the third minimization module 
426 has selected the best vector from the 2D VQ gain 
codebook 412 for each subframe that minimizes the energy 
of the third error signal 442. Once the energy of the third 
error signal 442 has been minimized for each subframe, the 
index locations of the jointly quantized gains (g a ) and (g c ) 
435 and 436 are used to generate the gain component 147, 
179 for the frame. For the full-rate codec 22, the gain 
component 147 is the fixed and adaptive gain component 
148a, 150a and for the half-rate codec 24, the gain compo- 
nent 179 is the adaptive and fixed gain component 180a and 
182a. 

The synthesis filters 372, 394 and 420, the perceptual 
weighting filters 374, 396 and 422, the minimization mod- 
ules 378, 400 and 426, the multipliers 370, 392, 414 and 416, 
the adder 418, and the subtracters 376, 398 and 424 (as well 
as any other filter, minimization module, multiplier, adder, 
and subtractor described in this application) may be replaced 
by any other device, or modified in a manner known to those 
of ordinary skill in the art, that may be appropriate for the 
particular application. 

4.2 Excitation Processing Module for Type One Frames of 
the Full-Rate Codec and the Half-Rate Codec 

In FIG. 11, the Fl, HI first frame processing modules 72 
and 82 includes a 3D/4D open loop VQ module 454. The Fl, 
HI second sub -frame processing modules 74 and 84 of one 
embodiment include the adaptive codebook 368, the fixed 
codebook 390, a first multiplier 456, a second multiplier 
458, a first synthesis filter 460, and a second synthesis filter 
462. In addition, the Fl, HI second sub -frame processing 
modules 74 and 84 include a first perceptual weighting filter 
464, a second perceptual weighting filter 466, a first sub- 
tractor 468, a second subtractor 470, a first minimization 
module 472, and an energy adjustment module 474. The Fl, 
HI second frame processing modules 76 and 86 include a 
third multiplier 476, a fourth multiplier 478, an adder 480, 
a third synthesis filter 482, a third perceptual weighting filter 
484, a third subtractor 486, a buffering module 488, a second 
minimization module 490 and a 3D/4D VQ gain codebook 
492. 

The processing of frames classified as Type One within 
the excitation-processing module 54 provides processing on 
both a frame basis and a sub-frame basis, as previously 
discussed. For purposes of brevity, the following discussion 
will refer to the modules within the full rate codec 22. The 
modules in the half rate codec 24 may be considered to 
function similarly, unless otherwise noted. Quantization of 
the adaptive codebook gain by the Fl first frame-processing 
module 72 generates the adaptive gain component 148b. The 
Fl second subframe processing module 74 and the Fl 
second frame processing module 76 operate to determine the 
fixed codebook vector and the corresponding fixed code- 
book gain, respectively as previously set forth. The Fl 
second sub frame-processing module 74 uses the track 
tables, as previously discussed, to generate the fixed code- 
book component 1466 as illustrated in FIG. 2. 

The Fl second frame-processing module 76 quantizes the 
fixed codebook gain to generate the fixed gain component 
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1505. In one embodiment, the full-rate codec 22 uses 10 bits 
for the quantization of 4 fixed codebook gains, and the 
half-rate codec 24 uses 8 bits for the quantization of the 3 
fixed codebook gains. The quantization may be performed 
using moving average prediction. In general, before the 
prediction and the quantization are performed, the prediction 
states are converted to a suitable dimension. 

4.2.1 First Frame Processing Module 

One embodiment of the 3D/4D open loop VQ module 454 
may be the previously discussed four-dimensional pre vector 
quantizer (4D pre VQ) 166 and associated p re-gain quanti- 
zation table for the full-rate codec 22. Another embodiment 
of the 3D/4D open loop VQ module 454 may be the 
previously discussed three-dimensional pre vector quantizer 
(3D pre VQ) 198 and associated pre -gain quantization table 
for the half-rate codec 24. The 3D/4D open loop VQ module 
454 receives the unquantized pitch gains 352 from the pitch 
pre-processing module 322. The unquantized pitch gains 
352 represent the adaptive codebook gain for the open loop 
pitch lag, as previously discussed. 

The 3D/4D open loop VQ module 454 quantizes the 
unquantized pitch gains 352 to generate a quantized pitch 
gain (g* a ) 496 representing the best quantized pitch gains for 
each sub frame where k is the number of subframes. In one 
embodiment, there are four subframes for the full-rate codec 
22 and three subframes for the half-rate codec 24 which 
correspond to four quantized gains (g 1 ^, g 2 a , g 3 a , g 4 fl ) and 
three quantized gains (g 1 a , g 2 fl , g 3 fl ) of each subframe, 
respectively. The index location of the quantized pitch gain 
(g* a ) 496 within the pre-gain quantization table represents 
the adaptive gain component 148b for the full-rate codec 22 
or the adaptive gain component 1806 for the half-rate codec 
24. The quantized pitch gain (g k a ) 496 is provided to the Fl 
second sub frame-processing module 74 or the HI second 
sub frame-processing module 84. 

4.2.2 Second Sub -Frame Processing Module 

The Fl or HI second subframe-processing module 74 or 
84 uses the pitch track 348 provided by the pitch pre- 
processing module 322 to identify an adaptive codebook 
vector (v%) 498. The adaptive codebook vector (v*J 498 
represents the adaptive codebook contribution for each 
subframe where k equals the subframe number. In one 
embodiment, there are four subframes for the full-rate codec 
22 and three subframes for the half-rate codec 24 which 
correspond to four vectors (v 1 ^, v 2 fl , v 3 a( v 4 0 ) and three 
vectors (v J a , v 2 at v 3 fl ) for the adaptive codebook contribu- 
tion for each subframe, respectively. 

The vector selected for the adaptive codebook vector (v* tf ) 
498 may be derived from past vectors located in the adaptive 
codebook 368 and the pitch track 348. Where the pitch track 
348 may be interpolated and is represented by L p (n). 
Accordingly, no search is required. The adaptive codebook 
vector (v* fl ) 498 may be obtained by interpolating the past 
adaptive codebook vectors (v* n ) 498 in the adaptive code- 
book with a 21" order Hamming weighted Sine window by: 
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weighting filter module 464 to provide a first resynthesized 
speech signal 500. The first synthesis filter 460 receives the 
quantized LPC coefficients A^z) 342 from the LSF quan- 
tization module 334 as part of the processing. The first 
subtracter 468 subtracts the first resynthesized speech signal 
500 from the modified weighted speech 350 provided by the 
pitch pre-processing module 322 to generate a long-term 
error signal 502. 

The Fl or HI second subframe-processing module 74 or 
84 also performs a search for the fixed codebook contribu- 
tion that is similar to that performed by the F0 or HO first 
subframe-processing module 70 and 80, previously dis- 
cussed. Vectors for a fixed codebook vector (v*^ 504 that 
represents the long-term residual for a subframe are selected 
from the fixed codebook 390 during the search. The second 
multiplier 458 multiplies the fixed codebook vector (v* c ) 
504 by a gain (g* c ) 506 where k is the subframe number. The 
gain (g* c ) 506 is unquantized and represents the fixed 
codebook gain for each subframe. The resulting signal is 
processed by the second synthesis filter 462 and the second 
perceptual weighting filter 466 to generate a second resyn- 
thesized speech signal 508. The second resynthesized 
speech signal 508 is subtracted from the long-term error 
signal 502 by the second sub tractor 470 to produce a fixed 
codebook error signal 510. 

The fixed codebook error signal 510 is received by the 
first minimization module 472 along with the control infor- 
mation 356. The first minimization module 472 operates the 
same as the previously discussed second minimization mod- 
ule 400 illustrated in FIG. 10. The search process repeats 
until the first minimization module 472 has selected the best 
vector for the fixed codebook vector (v* e ) 504 from the fixed 
codebook 390 for each subframe. The best vector for the 
fixed codebook vector (v* c ) 504 minimizes the energy of the 
fixed codebook error signal 510. The indices identify the 
best vector for the fixed codebook vector (v* c ) 504, as 
previously discussed, and form the fixed codebook compo- 
nent 146£> and 1786. 

4.2.2.1 Fixed Codebook Search for Full-Rate Codec 

In one embodiment, the 8-pulse codebook 162, illustrated 
in FIG. 4, is used for each of the four subframes for frames 
of type 1 by the full-rate codec 22, as previously discussed. 
The target for the fixed codebook vector (v*^) 504 is the 
long-term error signal 502, as previously described. The 
long-term error signal 502, represented by t'(n), is deter- 
mined based on the modified weighted speech 350, repre- 
sented by t(n), with the adaptive codebook contribution from 
the initial frame processing module 44 removed according 
to: 



50 



(Equation 22) 



where e(n) is the past excitation, i(L p (n)) and f(L p (n)) are the 
integer and fractional part of the pitch lag, respectively, and 
w,(f,i) is the Hamming weighted Sine window. 

The adaptive codebook vector (v* B ) 498 and the quantized 
pitch gain (g* fl ) 496 are multiplied by the first multiplier 456. 
The first multiplier 456 generates a signal that is processed 
by the first synthesis filter 460 and the first perceptual 



During the search for the best vector for the fixed code- 
book vector (y k c ) 504, pitch enhancement may be applied in 
the forward direction. In addition, the search procedure 
55 minimizes the fixed codebook residual 508 using an iterative 
search procedure with controlled complexity to determine 
(Equation 21) the best vector for the fixed codebook vector v k c 504. An 
initial fixed codebook gain represented by the gain (g* c ) 506 
is determined during the search. The indices identify the best 
60 vector for the fixed codebook vector (v* 0 ) 504 and form the 
fixed codebook component 146b as previously discussed. 
4.2.2.2 Fixed Codebook Search for Half-Rate Codec 
In one embodiment, the long-term residual is represented 
with 13 bits for each of the three subframes for frames 
65 classified as Type One for the half-rate codec 24, as previ- 
ously discussed. The long-term residual may be determined 
in a similar manner to the fixed codebook search in the 
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full-rate codec 22. Similar to the fixed-codebook search for An initial value for the fixed codebook gain for each 

the half -rate codec 24 for frames of Type Zero, the high- subframe to be used in the search may be obtained by 

frequency noise injection, the additional pulses that are minimizing: 
determined by high correlation in the previous subframe, 

and the weak short-term spectral filter may be introduced s ^ = (Equation 24) 

into the impulse response of the second synthesis filter 462. f*z* "1 

In addition, forward pitch enhancement also may be intra- «* ( '< n) " «* + - h(n)))) j. 
duced into the impulse response of the second synthesis 

filter 462. 10 

, . „ . . _ , . Where v„(n) is the adaptive-codebook contribution for a 

In one embodiment, a full search is performed for the particular sub f r ame and v,(n) is the fixed-codebook contri- 

2-pulse code book 196 and the 3-pulse codebook 197 as 5 ut j oa f or a pardcular subframe. In addition, is the 

illustrated in FIG. 5. The pulse codebook 196, 197 and the quantized and normalized adaptive-codebook gain for a 

best vector for the fixed codebook vector (v* c ) 504 that particular subframe that is one of the elements a quantized 

minimizes the fixed codebook error signal 510 are selected 15 fixed codebook gain (g*J 513. The calculated fixed code- 

for the representation of the long term residual for each book gain g c is further normalized and corrected, to provide 

, , . . . - . . . , the best energy match between the third resynthesized 

subframe. In addition an initial fixed codebook gam repre- speech signa , */ nd lhe modified target ^ S12 that has 

sented by the gam (g* c ) 506 may be determined during the bccn buffered. Unquantized fixed-codebook gains from the 

search similar to the full-rate codec 22. The indices identify 20 previous subframes may be used to generate the adaptive 

the best vector for the fixed codebook vector (v^J 504 and codebook vector (v* a ) 498 for the processing of the next 

form the fixed codebook component 178ft. subframe according to Equation 21. 

As previously discussed, the Fl or HI second subframe- T^ 1 *^ ^ ^ V™^JS£ ™ d * b0 ° k 

' ; . - A OA 4 . c u * gain (g J 513 is performed within the 3D/4D VQ gam 

processing module 74 or 84 operates on a subframe baas 25 ^ % The 3D/4D VQ gain codebook 492 may be 

However, the Fl or HI second frame-processing module 76 ^ previously discussed multidimensional gain quantizer 

or 86 operates on a frame basis. Accordingly, parameters afld associaled gain quantization table. In one embodiment, 

determined by the Fl or HI second subfirame-processing mc 3D/4D VQ gain codebook 492 may be the previously 

module 74 or 84 may be stored in the buffering module 488 discussed 4D delayed VQ gain quantizer 168 for the full-rate 

for later use on a frame basis. In one embodiment, the 30 codec 22. As previously discussed, the 4D delayed VQ gain 

parameters stored are the best vector for the adaptive code- quantizer 168 may be operable using the associated delayed 

book vector (v* ) 498 and the best vector for the fixed gain quantization table illustrated as Table 5. In another 

codebook vector° (v* c ) 504. In addition, a modified target embodiment, the 3D/4D VQ gain codebook 492 may be the 

signal 512 and the gains (g* fl ), (g\) 496 and 506 represent- P"^^ 8 ^^ 0 ^^ 0 ^IT*™ 2 

. * . ... , , fL / j 1 1 . . 35 the half-rate codec 24. The 3D delayed VQ gam quantizer 

mg tne initial adaptive and nxed coaeooox gains may oe m may be k using ^ d ^ yed 

gain quantization 

stored. Generation of the modified target signal 512 mil be {Mq iUustrated as ^ previously discussed Table 8. 

described later. The 3D/4D VQ gain codebook 492 may be searched for 

At this time, the best vector for the adaptive codebook vectors for the quantized fixed codebook gain (g* c ) 513 that 

vector (v* B ) 498, the best vector for the fixed codebook *o minimize the energy similar to the previously discussed 2D 

vector (v* c ) 504, and the best pitch gains for the quantized VQ gain codebook 412 of FIG. 10. The quantized vectors in 

pitch gain (g*J 496 have been identified. Using these best the 3D ' 4D V Q S ain codebook 492 actually represent a 

vectors and best pitch gains, the best fixed codebook gains corrcctlo 1 n f acl0f fc \ r * c P rcd f cd fi * cd ™ d *°f 

r Al . , k f , ft , .„ , , , . . ™ , . z. j previously discussed. Dunng the search, the third multiplier 

for the gain (e *) 506 wil be determined. The best fixed , * £ , , . 1 t /jtx^nou «u 

6 v& c * . y*x ... * . 45 476 multiplies the adaptive codebook vector (v*_) 498 by the 

codebook gains for the gain (g c ) 506 will replace the quantized pitch gain ( g*j 496 following determination of 

unquantized initial fixed codebook gains determined previ- thc mod jfi ed targct 512. I n addition, the fourth multiplier 

ously for the gain (g* c ) 506. To determine the best fixed 478 multiplies the fixed codebook vector (v* a ) 504 by the 

codebook gains, a joint delayed quantization of the fixed- quantized fixed codebook gain (g* c ) 513. The adder 480 adds 

codebook gains for each subframe is performed by the 50 the resulting signals from the multipliers 476 and 478. 

second frame -processing module 76 and 86. The resulting signal from the adder 480 is passed through 

4.2.3 Second Frame Processing Module the third synthesis filter 482 and the perceptual weighting 

Thc second frame processing module 76 and 86 is oper- filter ™ d / A e 484 to generate a third resynthesized speech 

c • * * . ti- c a ji_ t signal 514. As with the first and second synthesis filters 460, 

able on a frame bas« to generate the fixed codebook gain 55 * ^ ^ ^ ^ ^ recdv J lhe quantized lpc 

represented by the fixed I gaic .component 150b and 182ft. coefficicnts ^ ^ ^ thc ^ quantization modulc 

The modified target 512 is first determined in a manner 334 as ^ ^ roocssi ^ third subtractor 486 

similar to the gain determination and quantization of the ^ lhifd resynthesizcd speech signa i 514 from thc 

frames classified as Type Zero. The modified target 512 is mo dified target signal 512 that was previously stored in the 

determined for each subframe and is represented by t"(n). 60 buffering module 488. The resulting signal is the weighted 

Thc modified target may be derived using the best vectors mean square d error referred to as a third error signal 516. 

for the adaptive codebook vector (v*J 498 and the fixed Th e third minimization module 490 receives the third 

codebook vector (v* c ) 504, as well as the adaptive codebook err0 r signal 516 that represents the error resulting from 

gain and the initial value of the fixed codebook gain derived quantization of the fixed codebook gain by the 3D/4D VQ 

from Equation 18 by : 65 gain codebook 492. The third minimization module 490 uses 

the third error signal 516 to control the search and selection 

f m {n)'g a ^yh(n)+ ge v e (n) m h{n). (Equation 23) of vectors from the 3D/4D VQ gain codebook 492 in order 
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to reduce the energy of the third error signal 516. The search 
process repeats until the third minimization module 490 has 
selected the best vector from the 3D/4D VQ gain codebook 
492 for each subframe that minimizes the error in the third 
error signal 516. Once the energy of the third error signal 
516 has been minimized, the index location of the quantized 
fixed codebook gain (g* c ) 513 in the 3D/4D VQ gain 
codebook 492 is used to generate the fixed codebook gain 
component 1506 for the full-rate codec 22, and the fixed 
codebook gain component 182£> for the half-rate codec 24. 

4.2.3.1 3D/4D VQ Gain Codebook 

In one embodiment, when the 3D/4D VQ gain codebook 
492 is a 4-dimensional codebook, it may be searched in 
order to minimize 

(Equation 25) 



39 




Z(< 2 <">- 








Z(<»- 


•(iW(n)*Mn) + |^(n)*/i(n))) 2 



where the quantized pitch gains {g^g/jg/jg/} originate 
from the initial frame processing module 44, and {^(n)^ 2 
(n),t 3 (n),t 4 (n), K>),v/(n),v/(n),v/(n), and {v c \*W 
(n),v c 3 (n),v c 4 (n) may be buffered during the subframe pro- 
cessing as previously discussed. In an example embodiment, 
the fixed codebook gains {gcM/^ 3 ^ 4 are derived from a 
10-bit codebook, where the entries of the codebook contain 
a 4-dimensional correction factor for the predicted fixed 
codebook gains as previously discussed. In addition, n=40 to 
represent 40 samples per frame. 

In another embodiment, when the 3D/4D VQ gain code- 
book 492 is a 3-dimensional codebook, it may be searched 
in order to minimize 
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(Equation 26) 



where the quantized pitch gains {fcJ&J'&a originate from 
the initial frame processing module 44, and {t J (n),t 2 (n),t 3 (n), 
{v« J (n),v fl 2 (n),v a 3 (n), and {v/(n),v>),v c J (n) may be buff- 
ered during the subframe processing as previously dis- 
cussed. In an example embodiment, the fixed codebook 
gains i&c&c&c are derived from an 8-bit codebook where 
the entries of the codebook contain a 3-dimensional correc- 
tion factor for the predicted fixed codebook gains. The 
prediction of the fixed-codebook gains may be based on 
moving average prediction of the fixed codebook energy in 
the log domain. 
5.0 Decoding System 

Referring now to FIG. 12, an expanded block diagram 
representing the full and half-rate decoders 90 and 92 of 
FIG. 3 is illustrated. The full or half-rate decoders 90 or 92 
include the excitation reconstruction modules 104, 106, 114 



and 116 and the linear prediction coefficient (LPQ recon- 
struction modules 107 and 118. One embodiment of each of 
the excitation reconstruction modules 104, 106, 114 and 116 
includes the adaptive codebook 368, the fixed codebook 

5 390, the 2D VQ gain codebook 412, the 3D/4D open loop 
VQ codebook 454, and the 3D/4D VQ gain codebook 492. 
The excitation reconstruction modules 104, 106, 114 and 
116 also include a first multiplier 530, a second multiplier 
532 and an adder 534. In one embodiment, the LPC recon- 

10 struction modules 107, 118 include an LSF decoding module 
536 and an LSF conversion module 538. In addition, the 
half-rate codec 24 includes the predictor switch module 336, 
and the full-rate codec 22 includes the interpolation module 
338. 

15 Also illustrated in FIG. 12 are the synthesis filter module 
98 and the post -processing module 100. In one embodiment, 
the post-processing module 100 includes a short-term post 
filter module 540, a long-term filter module 542, a tilt 
compensation filter module 544, and an adaptive gain con- 
20 trol module 546. According to the rate selection, the bit- 
stream may be decoded to generate the post-processed 
synthesized speech 20, The decoders 90 and 92 perform 
inverse mapping of the components of the bit-stream to 
algorithm parameters. The inverse mapping may be fol- 
25 lowed by a type classification dependent synthesis within the 
full and half-rate codecs 22 and 24. 

The decoding for the quarter-rate codec 26 and the 
eighth-rate codec 28 are similar to the full and half -rate 
codecs 22 and 24. However, the quarter and eighth-rate 
codecs 26 and 28 use vectors of similar yet random numbers 
and the energy gain, as previously discussed, instead of the 
adaptive and the fixed codebooks 368 and 390 and associ- 
ated gains. The random numbers and the energy gain may be 
used to reconstruct an excitation energy that represents the 
short-term excitation of a frame. The LPC reconstruction 
modules 122 and 126 also are similar to the full and half -rate 
codec 22, 24 with the exception of the predictor switch 
module 336 and the interpolation module 338. 
5.1 Excitation Reconstruction 

Within the full and half rate decoders 90 and 92, operation 
of the excitation reconstruction modules 104, 106, 114 and 
116 is largely dependent on the type classification provided 
by the type component 142 and 174. The adaptive codebook 
368 receives the pitch track 348. The pitch track 348 is 
45 reconstructed by the decoding system 16 from the adaptive 
codebook component 144 and 176 provided in the bitstream 
by the encoding system 12. Depending on the type classi- 
fication provided by the type component 142 and 174, the 
adaptive codebook 368 provides a quantized adaptive code- 
50 book vector (v* a ) 550 to the multiplier 530. The multiplier 
530 multiplies the quantized adaptive codebook vector (v* fl ) 
550 with an adaptive codebook gain vector (g* ) 552. The 
selection of the adaptive codebook gain vector (g a ) 552 also 
depends on the type classification provided by the type 
55 component 142 and 174. 

In an example embodiment, if the frame is classified as 
Type Zero in the full rate codec 22, the 2D VQ gain 
codebook 412 provides the adaptive codebook gain vector 
(g* fl ) 552 to the multiplier 530. The adaptive codebook gain 
60 vector (g k a ) 552 is determined from the adaptive and fixed 
codebook gain component 148a and 150a. The adaptive 
codebook gain vector (g* a ) 552 is the same as part of the best 
vector for the quantized gain vector (| flC ) 433 determined by 
the gain and quantization section 366 of the F0 first sub- 
65 frame processing module 70 as previously discussed. The 
quantized adaptive codebook vector (v* 0 ) 550 is determined 
from the closed loop adaptive codebook component 1446. 
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Similarly, the quantized adaptive codebook vector (v* a ) 550 
is the same as the best vector for the adaptive codebook 
vector (v fl ) 382 determined by the FO first sub -frame pro- 
cessing module 70. 

The 2D VQ gain codebook 412 is two-dimensional and 
provides the adaptive codebook gain vector (g* a ) 552 to the 
multiplier 530 and a fixed codebook gain vector (g* c ) 554 to 
the multiplier 532. The fixed codebook gain vector (g* c ) 554 
similarly is determined from the adaptive and fixed code- 
book gain component 148a and 150a and is part of the best 
vector for the quantized gain vector (g oc ) 433. Also based on 
the type classification, the fixed codebook 390 provides a 
quantized fixed codebook vector (v* fl ) 556 to the multiplier 
532. The quantized fixed codebook vector (v*J 556 is 
reconstructed from the codebook identification, the pulse 
locations (or the Gaussian codebook 195 for the half-rate 
codec 24), and the pulse signs provided by the fixed code- 
book component 146a. The quantized fixed codebook vector 
(v* a ) 556 is the same as the best vector for the fixed 
codebook vector (v c ) 402 determined by the F0 first sub- 
frame processing module 70 as previously discussed. The 
multiplier 532 multiplies the quantized fixed codebook 
vector (v*^ 556 by the fixed codebook gain vector (g* c ) 554. 

If the type classification of the frame is Type One, a 
multi-dimensional vector quantizer provides the adaptive 
codebook gain vector (g k a ) 552 to the multiplier 530. Where 
the number of dimensions in the multi -dimensional vector 
quantizer is dependent on the number of sub frames. In one 
embodiment, the multi -dimensional vector quantizer may be 
the 3D/4D open loop VQ 454. Similarly, a multi- 
dimensional vector quantizer provides the fixed codebook 
gain vector (g* c ) 554 to the multiplier 532. The adaptive 
codebook gain vector (g* fl ) 552 and the fixed codebook gain 
vector (g* c ) 554 are provided by the gain component 147 and 
179 and are the same as the quantized pitch gain (g*J 496 
and the quantized fixed codebook gain (g* c ) 513, respec- 
tively. 

In frames classified as Type Zero or Type One, the output 
from the first multiplier 530 is received by the adder 534 and 
is added to the output from the second multiplier 532. The 
output from the adder 534 is the short-term excitation. The 
short-term excitation is provided to the synthesis filter 
module 98 on the short-term excitation line 128. 
5,2 LPC Reconstruction 

The generation of the short-term (LPC) prediction coef- 
ficients in the decoders 90 and 92 is similar to the processing 
in the encoding system 12. The LSF decoding module 536 
reconstructs the quantized LSFs from the LSF component 
140 and 172. The LSF decoding module 536 uses the same 
LSF prediction error quantization table and LSF predictor 
coefficients tables used by the encoding system 12. For the 
half- rate codec 24, the predictor switch module 336 selects 
one of the sets of predictor coefficients, to calculate the 
predicted LSFs as directed by the LSF component 140, 172. 
Interpolation of the quantized LSFs occurs using the same 
linear interpolation path used in the encoding system 12. For 
the full-rate codec 22 for frames classified as Type Zero, the 
interpolation module 338, selects the one of the same 
interpolation paths used in the encoding system 12 as 
directed by the LSF component 140 and 172. The weighting 
of the quantized LSFs is followed by conversion to the 
quantized LPC coefficients A q (z) 342 within the LSF con- 
version module 538. The quantized LPC coefficients A^(z) 
342 are the short-term prediction coefficients that are sup- 
plied to the synthesis filter 98 on the short-term prediction 
coefficients line 130. 



5.3 Synthesis Filter 

The quantized LPC coefficients A^(z) 342 may be used by 
the synthesis filter 98 to filter the short-term prediction 
coefficients. The synthesis filter 98 may be a short-term 
inverse prediction filter that generates synthesized speech 
prior to post-processing. The synthesized speech may then 
be passed through the post-processing module 100. The 
short-term prediction coefficients may also be provided to 
the post-processing module 100. 

5.4 Post-Processing 

The post -processing module 100 processes the synthe- 
sized speech based on the rate selection and the short-term 
prediction coefficients. The short-term post filter module 540 
may be first to process the synthesized speech. Filtering 
parameters within the short-term post filter module 540 may 
be adapted according to the rate selection and the long-term 
spectral characteristic determined by the characterization 
module 328 as previously discussed with reference to FIG. 
9. The short-term post filter may be described by: 
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(Equation 27) 



where in an example embodiment, y lj( 
and y 2 =0.75, and r 0 is determined based on the rate selection 
and the long-term spectral characteristic. Processing contin- 
ues in the long term filter module 542. 

The long term filter module 542 performs a fine tuning 
search for the pitch period in the synthesized speech. In one 
embodiment, the fine tuning search is performed using pitch 
correlation and rate -dependent gain controlled harmonic 
filtering. The harmonic filtering is disabled for the quarter- 
rate codec 26 and the eighth-rate codec 28. The tilt com- 
pensation filter module 544, in one embodiment is a first- 
order finite impulse response (FIR) filter. The FIR filter may 
be tuned according to the spectral tilt of the perceptual 
weighting filter module 314 previously discussed with ref- 
erence to FIG. 9. The filter may also be tuned according to 
the long-term spectral characteristic determined by the char- 
acterization module 328 also discussed with reference to 
FIG. 9. 

The post filtering may be concluded with an adaptive gain 
control module 546. The adaptive gain control module 546 
brings the energy level of the synthesized speech that has 
been processed within the post-processing module 100 to the 
level of the synthesized speech prior to the post-processing. 
Level smoothing and adaptations may also be performed 
within the adaptive gain control module 546. The result of 
the processing by the post -processing module 100 is the 
post-processed synthesized speech 20. 

In one embodiment of the decoding system 16, frames 
received by the decoding system 16 that have been erased 
due to, for example, loss of the signal during radio 
transmission, are identified by the decoding system 16. The 
decoding system 16 can subsequently perform a frame 
erasure concealment operation. The operation involves 
interpolating speech parameters for the erased frame from 
the previous frame. The extrapolated speech parameters may 
be used to synthesize the erased frame. In addition, param- 
eter smoothing may be performed to ensure continuous 
speech for the frames that follow the erased frame. In 
another embodiment, the decoding system 16 also includes 
bad rate determination capabilities. Identification of a bad 
rate selection for a frame that is received by the decoding 
system 16 is accomplished by identifying illegal sequences 
of bits in the bitstream and declaring that the particular 
frame is erased. 
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The previously discussed embodiments of the speech 
compression system 10 perform variable rate speech com- 
pression using the full-rate codec 22, the half -rate codec 24, 
the quarter-rate codec 26, and the eighth-rate codec 28. The 
codecs 22, 24, 26 and 28 operate with different bit alloca- 5 
tions and bit rates using different encoding approaches to 
encode frames of the speech signal 18. The encoding 
approach of the full and half- rate codecs 22 and 24 have 
different perceptual matching, different waveform matching 
and different bit allocations depending on the type classifi- 10 
cation of a frame. The quarter and eighth-rate codecs 26 and 
28 encode frames using only parametric perceptual repre- 
sentations. A Mode signal identifies a desired average bit 
rate for the speech compression system 10. The speech 
compression system 10 selectively activates the codecs 22, 15 
24, 26 and 28 to balance the desired average bit rate with 
optimization of the perceptual quality of the post-processed 
synthesized speech 20. 

While various embodiments of the invention have been 
described, it will be apparent to those of ordinary skill in the 2 o 
art that many more embodiments and implementations are 
possible that are within the scope of this invention. 
Accordingly, the invention is not to be restricted except in 
light of the attached claims and their equivalents. 

What is claimed is: 25 

1. A variable rate speech compression system for process- 
ing a speech signal, the variable rate speech compression 
system comprising: 

an encoding system operable to determine a rate selection 
and a type classification for the speech signal, the 30 
encoding system comprising: 

a line spectrum frequency prediction error quantization 
table selectable as a function of the rate selection, the 
line spectrum frequency prediction error quantiza- 
tion table associated with encoding short-term pre- 35 
dictor parameters of the speech signal; 
a 2D gain quantization table associated with jointly 
encoding an adaptive codebook gain and a fixed 
codebook gain of the speech signal when the type 
classification is a first type; 4Q 
a pre-gain quantization table selectable as a function of 
the rate selection, the pre-gain quantization table 
associated with exclusively encoding the adaptive 
codebook gain when the type classification is a 
second type; 45 
a delayed gain quantization table selectable as a func- 
tion of the rate selection, the delayed gain quantiza- 
tion table associated with exclusively encoding the 
fixed codebook gain when the type classification is 
the second type; and 50 
a decoding system in communication with the encoding 
system, the decoding system operable to decode the 
speech signal with the line spectrum frequency pre- 
diction error quantization table and at least one of: 
the 2D gain quantization table, ss 
the pre-gain quantization table, and 
the delayed gain quantization table, 
as a function of the rate selection and the type classifi- 
cation. 

2. The variable rate speech compression system of claim 60 

1, where the line spectrum frequency prediction error quan- 
tization table comprises four stages when the rate selection 
is a full rate. 

3. The variable rate speech compression system of claim 

2, where a first stage comprises 128 quantization vectors. 65 

4. The variable rate speech compression system of claim 
1, where the line spectrum frequency prediction error quan- 



,593 Bl 

60 

tization table comprises a first stage and a second stage each 
with 128 quantization vectors and a third stage with 64 
quantization vectors when the rate selection is a half rate. 

5. The variable rate speech compression system of claim 
4, where the first stage comprises a first quantization vector 
represented as {0.00842379, 0.00868718, 0.01533677, 
0.00423439, -0.00886805, -0.02132286, -0.03152681, 
-0.01975061, -0.01152093, -0.01341948} and a second 
quantization vector represented as {0.02528175, 
0.04259634, 0.03789221, 0.01659535, -0.00266498, 
-0.01529545, -0.01653101, -0.01528401, -0.01047642, 
-0.01127117}. 

6. The variable rate speech compression system of claim 
4, where the second stage comprises a first quantization 
vector represented as {0.00589332, 0.00462334, 
-0.00937151, -0.01478366, 0.00674597, 0.00164302, 
-0.00890749, -0.00091839, 0.00487032, 0.00012026} and 
a second quantization vector represented as {-0.00346857, 
-0.00100200, -0.00418711, -0.01512477, -0.00104209, 
-0.00491133, -0.00209555, 0.00045850, 0,00023339, 
0.00567173}. 

7. The variable rate speech compression system of claim 
4, where the third stage comprises a first quantization vector 
represented as {-0.00071405, 0.00244371, 0.00235739, 
-0.00329369, 0.00472867, -0.00361321, -0.00584670, 
0.00863128, 0.00145642, -0.00441746} and a second quan- 
tization vector represented as {0.00242589, -0.00430711, 
-0.00122645, -0.00464764, -0.00017887, -0,00471663, 

0. 00181162, 0.00249980, -0.00276848, -0.00485697}. 

8. The variable rate speech compression system of claim 

1, where the encoding system is operable to jointly encode 
the fixed codebook gain and the adaptive codebook gain 
with the 2D gain quantization table for each of at least two 
subframes of a frame of the speech signal. 

9. The variable rate speech compression system of claim 
1, where the 2D gain quantization table comprises 128 
quantization vectors of 2 elements each, 

10. The variable rate speech compression system of claim 
1, where the 2D gain quantization table comprises a first 
quantization vector represented as {1.13718400, 
2.00167200} and a second quantization vector represented 
as {1.15061100, 0.80219900} when the rate selection is a 
full rate. 

11. The variable rate speech compression system of claim 
1, where the pre-gain quantization table comprises 64 vec- 
tors when the rate selection is a full rate. 

12. The variable rate speech compression system of claim 
1, where the pre-gain quantization table comprises 16 vec- 
tors when the rate selection is a half rate. 

13. The variable rate speech compression system of claim 
I, where the pre-gain quantization table comprises a first 
vector represented as {0.60699869, 0.59090763, 

0. 64920781, 0.64610492} and a second vector represented 
as {0.68101613, 0.65403889, 0.64210982, 0.63130892} 
when the rate selection is a full rate. 

14. The variable rate speech compression system of claim 

1, where the pre-gain quantization table comprises a first 
vector represented as {1.16184904, 1.16859789, 
1,13656320} and a second vector represented as 
{1.14613289, 1.06371877, 0.91852525} when the rate 
selection is a half rate. 

15. The variable rate speech compression system of claim 
1, where the delayed gain quantization table comprises 1024 
vectors when the rate selection is a full rate. 

16. The variable rate speech compression system of claim 
1, where the delayed gain quantization table comprises 256 
vectors when the rate selection is a half rate. 
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17. The variable rate speech compression system of claim 
1, where the encoding system is operable to encode the 
frame with the delayed gain quantization table and a plu- 
rality of predictor coefficients when the type classification is 
the second type. 

18. The variable rate speech compression system of claim 
17, where the predictor coefficients comprise a first predictor 
coefficient represented as {0.7, 0.6, 0.4, 0.2}, a second 
predictor coefficient represented as {0.4, 0.2, 0.1, 0.05}, a 
third predictor coefficient represented as {0.3, 0.2, 0.075, 
0.025} and a fourth predictor coefficient represented as {0.2, 

0. 075, 0.025, 0.0} when the rate selection is a full rate. 

19. The variable rate speech compression system of claim 
17, where the predictor coefficients comprise a first predictor 
coefficient represented as {0.6, 0.3, 0.1}, a second predictor 
coefficient represented as {0.4, 0.25, 0.1 }, and a third 
predictor coefficient represented as {0.3, 0.15, 0.075} when 
the rate selection is a half rate. 

20. The variable rate speech compression system of claim 

1, where the delayed gain quantization table comprises a first 
vector represented as {0.18423671, 0.06523999, 

0. 13390472} and a second vector represented as 
{0.27552690, 0.09702324, 0.05427950} when the rate 
selection is a half rate. 

21. The variable rate speech compression system of claim 

1, where the encoding system further comprises a line 
spectrum frequency predictor coefficients table associated 
with encoding short-term predictor parameters, the line 
spectrum frequency predictor coefficients table comprising: 

a first set of predictor coefficients; and 

a second set of predictor coefficients; 

wherein the first and the second set of predictor coeffi- 
cients are selectable by the encoding system when the 
rate selection is a half rate. 

22. The variable rate speech compression system of claim 
21, where the first set of predictor coefficients comprises a 
first vector represented as {0.45782564, 0.59002827, 
0.73704688, 0.73388197, 0.75903791, 0.74076479, 
0.65966007, 0.58070788, 0.52280647, 0.42738207} and a 
second vector represented as {0.19087084, 0.26721569, 
0.38110463, 0.39655069, 0,43984539, 0.42178869, 
0.34869783, 0.28691864, 0.23847475, 0.17468375}. 

23. The variable rate speech compression system of claim 
21, where the second set of predictor coefficients comprises 
a first vector represented as {0.14936742, 0.25397094, 
0.42536339, 0.40318214, 0.39778242, 0,34731435, 
0.22773174, 0.17583478, 0.12497067, 0.11001108} and a 45 
second vector represented as {0.09932127, 0.15389237, 
0.24021347, 0.24507006, 0.26478926, 0.23018456, 
0.15178193, 0.11368182, 0.07674584, 0.06122567}. 

24. A variable rate speech compression system for pro- 
cessing a speech signal, the variable rate speech compres- 50 
sion system comprising: 

an encoding system operable to determine a bit rate and 
a type classification for the speech signal, the bit rate 
comprising a first rate and a second rate, and the type 
classification comprising a first type and a second type, 
the encoding system comprising: 
a line spectrum frequency prediction error quantization 
table selectable as a function of the bit rate, wherein 
the encoding system is operable to encode short-term 
predictor parameters of the speech signal with the 
line spectrum frequency prediction error quantiza- 
tion table; 

an interpolation module operable with the tine spec- 
trum frequency prediction error quantization table to 
encode short-term predictor parameters, when the bit 65 
rate is the first rate and the type classification is the 
first type; 
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a line spectrum frequency predictor coefficient table 
selectable as a function of the bit rate, wherein the 
encoding system is operable to generate predicted 
line spectrum frequencies with the line spectrum 
frequency predictor coefficient table; and 

a predictor switch module operable with the line spec- 
trum frequency predictor coefficient table to generate 
predicted line spectrum frequencies, when the bit 
rate is the second rate. 

25. The variable rate speech compression system of claim 
24, where the line spectrum frequency prediction error 
quantization table comprises 4 stages when the bit rate is the 
first rate. 

26. The variable rate speech compression system of claim 
24, where the line spectrum frequency prediction error 
quantization table comprises 3 stages when the bit rate is the 
second rate. 

27. The variable rate speech compression system of claim 
24, where the line spectrum frequency predictor coefficients 
table comprises two vectors of predictor coefficients when 
the bit rate is the first rate. 

28. The variable rate speech compression system of claim 
24, where the line spectrum frequency predictor coefficients 
table comprises two sets of four vectors of predictor coef- 
ficients when the bit rate is the second rate, where the 
predictor switch is operable to select one of the sets. 

29. The variable rate speech compression system of claim 
24, where the interpolation module comprises a plurality of 
interpolation paths selectable as a function of variations in 
the spectral envelope of the speech signal. 

30. The variable rate speech compression system of claim 
29, where the interpolation paths comprise 4 interpolation 
paths. 

31. The variable rate speech compression system of claim 
24, where the interpolation module is operable to apply one 
of a plurality of interpolation paths to adjust the contour of 
a spectral envelope of the speech signal. 

32. The variable rate speech compression system of claim 
31, where the interpolation module is operable to determine 
the interpolation paths as a function of a plurality of prede- 
termined weighting factors. 

33. A method of processing a speech signal with a variable 
rate speech compression system, the method comprising: 

determining a rate and a type for the speech signal; 
encoding short-term predictor parameters of the speech 

signal with a line spectrum frequency prediction error 

quantization table as a function of the rate; 
jointly encoding an adaptive codebook gain and a fixed 

codebook gain of the speech signal with a 2D gain 

quantization table when the type is a first type; 
encoding the adaptive codebook gain with a pre-gain 

quantization table as a function of the rate when the 

type is a second type; and 
encoding the fixed codebook gain with a delayed gain 

quantization table as a function of the rate when the 

type is the second type. 

34. The method of claim 33, further comprising decoding 
the speech signal with the line spectrum frequency predic- 
tion error quantization table and at least one of the 2D gain 
quantization table, the pre-gain quantization table and the 
delayed gain quantization table as a function of the rate and 
the type. 

35. The method of claim 33, where encoding with the line 
spectrum frequency prediction error quantization table when 
the rate is a half rate comprises: 

selecting one of a first set of predictor coefficients and a 
second set of predictor coefficients from a line spec- 
trum frequency predictor coefficients table; and 
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determining predicted line spectrum frequencies with the 
selected set of predictor coefficients. 

36. The method of claim 33, where encoding with the line 
spectrum frequency prediction error quantization table when 
the rate is a full rate and the type is the first type comprises: 

selecting one of a plurality of interpolation paths; and 
adjusting the weighting of previously quantized line spec- 
trum frequencies and the weighting of currently quan- 
tized line spectrum frequencies with the interpolation 
path. 

37. The method of claim 33, where encoding with the 
p re-gain quantization table when the rate is a full rate 
comprises: 

determining the adaptive codebook gain for each of four 
subframes of a frame of the speech signal; and 

analyzing vectors in the pre-gain quantization table com- 
prising a first vector represented as {0.60699869, 
0.59090763, 0.64920781, 0.64610492} to select one of 
the vectors with elements representing the adaptive 
codebook gain of each of the subframes. 

38. The method of claim 33, where encoding with the 
pre-gain quantization table when the rate is a half rate 
comprises: 

determining the adaptive codebook gain for each of three 
subframes of a frame of the speech signal; and 

analyzing vectors in the pre-gain quantization table com- 
prising a first vector represented as {1.16184904, 
1.16859789, 1.13656320} to select one of the vectors 
with elements representing the adaptive codebook gain 
of each of the subframes. 

39. The method of claim 33, where encoding with the 
delayed gain quantization table when the rate is a half rate 
comprises: 

completing the search in a fixed codebook for each of 
three subframes of a frame of the speech signal; 

determining the fixed codebook gain for each of the 
subframes; and 

analyzing vectors in the delayed gain quantization table 
comprising a first vector represented as {0.18423671, 
0.06523999, 0.13390472} to select one of the vectors 
with elements representing the fixed codebook gain of 
each of the subframes. 

40. The method of claim 33, here encoding with the 
delayed gain quantization table when the type is the second 
type comprising: 

representing the fixed codebook gain for each of a plu- 
rality of subframes of a frame of the speech signal with 
a fixed codebook energy; 

generating a predicted fixed codebook energy for each of 
the subframes with quantized fixed codebook energy 
errors from a plurality of subframes of a previous frame 
and a plurality of predictor coefficients; 

forming a vector with the difference in the fixed codebook 
energy and the predicted fixed codebook energy; and 

selecting a corresponding vector from the delayed gain 
quantization table. 

41. The method of claim 40, where generating the pre- 
dicted fixed codebook energy comprises multiplying the 
quantized fixed codebook energy errors by the predictor 
coefficients, the predictor coefficients comprising a first 
subframe predictor coefficient represented as {0.7, 0.6, 0.4, 
0.2}, a second subframe predictor coefficient represented as 
{0.4, 0.2, 0.1, 0.05}, a third subframe predictor coefficient 
represented as {0.3, 0.2, 0.075, 0.025} and a fourth subframe 
predictor coefficient represented as {0.2, 0.075, 0.025, 0.0}. 
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42. The method of claim 40, where generating the pre- 
dicted fixed codebook energy comprises multiplying the 
quantized fixed codebook energy errors by the predictor 
coefficients, the predictor coefficients comprising: 

a first predictor coefficient represented as {0.6, 0.3, 0.1}; 
a second predictor coefficient represented as {0.4, 0.25, 
. 0.1}; and 

a third predictor coefficient represented as {0.3, 0.15, 
0.075}; 

wherein the rate selection is a half rate. 

43. The method of claim 33, where jointly encoding with 
the 2D gain quantization table when the rate is the full rate 
comprises analyzing vectors within the 2D gain quantization 
table, the vectors comprising a first vector represented as 
{1.13718400, 2.00167200}. 

44. A method of processing a speech signal, the method 
comprising: 

selecting a bit rate and a type classification; 
converting short-term predictor parameters extracted 

from the speech signal to line spectrum frequencies; 
determining predicted line spectrum frequencies with a 
line spectrum frequency predictor coefficients table 
when the bit rate selected is a first rate; 
determining predicted line spectrum frequencies with a 
line spectrum frequency predictor coefficients table and 
a predictor switch module when the bit rate selected is 
a second rate; 

subtracting predicted line spectrum frequencies from line 
spectrum frequencies to generate a line spectrum fre- 
quencies prediction error; 
quantizing the line spectrum frequencies predication error 

to produce quantized line spectrum frequencies; and 
modifying the quantized line spectrum frequencies with 
an interpolation module when the bit rate selected is the 
first rate and the type classification is a first type; 
wherein when the bit rate is the second rate, determining 
predicted line spectrum frequencies comprises select- 
ing one of: 

a first set of predictor coefficients, the first set of 
predictor coefficients including a first vector repre- 
sented as {0.45782564, 0.59002827, 0.73704688, 
0.73388197, 0.75903791, 0.74076479, 0.65966007, 
0.58070788, 0.52280647, 0.42738207}; and 
a second set of predictor coefficients, the second set of 
predictor coefficients including a first vector repre- 
sented as {0.14936742, 0.25397094, 0.42536339, 
0.40318214, 0.39778242, 0.34731435, 0.22773174, 
0.17583478, 0.12497067, 0.11001108}. 

45. The method of claim 44, where determining predicted 
line spectrum frequencies when the bit rate is the second rate 
comprises selecting a set of predictor coefficients from the 
line spectrum frequency predictor coefficients table with the 
predictor switch module. 

46. The method of claim 44, where modifying the quan- 
tized line spectrum frequencies comprises selecting one of a 
plurality of interpolation paths, the interpolation paths 
derived from a predetermined weighting factor. 

47. The method of claim 44, where modifying the quan- 
tized line spectrum frequencies comprises: 

analyzing the degree of spectral variations between a 
plurality of subframes of a frame of the speech signal; 
and 

selecting an interpolation path as a function of the spectral 
variations. 
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